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Abstract 

This  report  summarizes  the  results  of  research  performed  under  Grant 
AFOSR- 74-2601  for  the  period  September  1,  1973  to  August  31,  1978.  The  re- 
search was  concerned  with  problems  in  communications  theory,  information  the- 
ory and  coding  theory.  The  report  contains  a short  summary  of  the  various  re- 
search topics  as  well  as  the  full  text  of  the  publications  which  have  appeared 
concerned  with  this  work. 


I 
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I.  Introduction 

This  is  the  final  technical  report  summarizing  research  conducted  under 
Grant  AFOSR-74-2601  sponsored  by  the  Air  Force  Office  of  Scientific  Research 
for  the  period  September  1,  1973  to  August  31,  1978.  Four  interim  scientific 
reports  were  submitted  summarizing  research  during  each  of  the  first  four  years 
of  the  grant.  The  work  was  performed  at  the  University  of  Massachusetts,  Amherst, 
Massachusetts. 

The  work  was  primarily  concerned  with  problems  in  communications  theory, 
information  theory  and  coding  theory.  A short  summary  of  some  of  this  work  is 
given  in  section  II.  Many  problems  were  treated  and  in  almost  all  cases  the 
results  were  published  in  journals  or  were  presented  at  conferences.  The  third 
section  of  this  report  consists  of  a compendium  of  these  journal  articles  and 
conference  papers.  The  fourth  section  of  this  report  consists  of  abstracts  of 
Ph.D.  dissertations  which  were  supported  under  this  grant. 
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II.  Summary  of  Research 

One  area  of  research  in  which  work  was  conducted  is  that  of  source  coding ( 
which  has  applications  in  data  reduction  for  efficient  transmission  or  storage 
of  information.  In  particular,  this  work  concentrated  on  the  efficient  repre- 
sentation of  the  output  of  several  information  sources  where  the  information 
from  the  sources  are  correlated  with  one  another.  In  a previous  paper  (Slepian 
and  Wolf,  "Noiseless  Coding  of  Correlated  Information  Sources,"  IEEE  Transactions 
on  Information  Theory,  vol.  IT-19,  pp.  471-480,  1973),  it  was  shown  that  two  cor- 
related information  sources  (X  and  Y)  could  be  separately  encoded  at  rates 
R = H(X)  and  R = H(Y|x)  and  that  these  encoded  message  streams  could  then  be 

* y 

decoded  to  the  original  message  streams  with  arbitrarily  small  error  probability. 
The  techniques  used  in  obtaining  this  result  were  found  to  be  applicable  to  se- 
veral new  problems  in  data  reduction.  The  details  of  these  problems  and  the  re- 
sults obtained  are  summarized  in  papers  given  in  Section  III. 

A second  research  problem  considered  in  depth  is  the  applications  of  com- 
munication theory  and  information  theory  to  automating  the  function  of  a technical 
controller.  The  general  problem  is  to  monitor  the  performance  of  a communication 
system  in  order  to  assess  its  reliability  and  to  identify  and  isolate  marginal  or 
faulty  links. 

One  application  of  information  theory  to  this  problem  is  as  follows.  As- 
sume that  an  error  correcting  code  is  used  at  a rate  below  the  capacity  of  the 
channel.  Furthermore,  assume  that  the  decoder  does  not  need  to  know  the  channel 
parameters  in  order  to  decode.  Then  if  the  code  is  such  that  a small  decoding 
error  probability  results,  the  decoder  also  has  a good  measure  of  the  reliability 
of  the  link  since  it  can  monitor  the  number  of  errors  corrected.  This  is  the 
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case  even  though  the  decoder  does  not  know  (a  priori)  the  transmitted  sequence. 


This  technique  should  be  contrasted  with  a technique  whereby  traffic  is  disrupted 
and  a special  test  signal  transmitted  in  order  to  measure  the  reliability  of  the 


channel . 

Another  method  of  automatically  estimating  the  error  probability  in  a di- 
gital communications  system  by  observing  only  the  received  signal  was  studied. 
This  method  had  the  following  characteristics: 

(a)  It  did  not  require  the  transmission  of  any  test  sequences  nor  did  it 
require  the  messages  to  have  any  particular  format. 

(b)  The  error  rate  could  be  estimated  on  short  sequences  that  possibly 
did  not  contain  any  errors . 

This  system  is  based  upon  a performance  monitoring  unit  (PMU)  that  was  proposed 
by  D.  J.  Gooding  ("Performance  Monitor  Technique  for  Digital  Receivers  Based  Upon 
Extrapolation  of  Error  Rate,"  IEEE  Transactions  on  Communications  Technology, 

Vol . COM- 16,  No.  3,  June  1968,  pp.  380-387). 

A complete  system  including  a modulator,  channel  simulator,  receiver  and 
PMU  was  designed  and  constructed  on  four  printed  circuit  boards.  The  total  cost 
of  the  system  was  approximately  one-hundred  dollars. 

The  system  was  calibrated  during  test  runs  whereby  measurements  made  by 
the  PMU  were  compared  with  actual  error  counts.  After  calibration,  the  system 
was  run  for  various  levels  of  signal  to  noise  ratio  and  excellent  correspondence 
was  found  between  the  predicted  error  rates  and  actual  error  counts. 

A new  method  was  found  for  achieving  maximum  likelihood  detection  of  the 

g 

q code  words  in  a (n,k)  linear  block  code  with  symbols  from  GF(q).  This  method 
can  utilize  soft  decisions.  The  complexity  of  the  method  grows  exponentially 


with  the  number  of  parity  symbols  rather  than  with  the  number  of  message  symbols 
This  method  was  successfully  applied  to  various  types  of  fading  communications 
channels . 

A particular  application  of  the  Chinese  Remainder  Theorem  to  the  design 
of  fault  tolerant  computers  has  been  investigated.  A brief  summary  of  this  work 
follows . 

The  basic  theory  to  be  used  is  the  following:  Let  m^,  m^,  . ..,  m^  be 

L positive  integers  that  are  relatively  prime  in  pairs.  Let  "I"  be  any  non- 

L 

negative  integer  less  than  m » n m. . Then  ”1"  can  be  uniquely  reconstructed 

i=l  1 

from  its  remainders,  r, , r,,  ...,  r,  where  I = Q.  m.  + r.  0 < r.  < m. . 

1Z  L t.  1 1 — a i * 

i ~ 1,2, ... ,L. 

An  example  is  given  in  the  following  table  for  = 2,  m2  * 3 and  m^  = 5. 
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\n  important  corollary  to  this  theorem  is: 


Let  S be  a subset  of  the  integers  , m2>  ...  m^.  If  I is  an  integer  in 


the  range  0 < I < n m. , then  I can  be  uniquely  reconstructed  from  the  remainders 
~ i=S  1 


corresponding  to  the  nr  in  this  subset. 


To  see  that  this  is  the  case  consider  the  previous  example  where  = 2, 


m2  = 3,  = 5.  Then  we  can  consider  three  sets  S as  follows: 


S = {m1,m2i 


{m. ,m. } 


S™  {m~,m  } 


0 0 


0 0 


0 0 


1 1 


Thus  given  the  remainders  (r^  r2,  r^) , any  two  of  these  remainders  can  uniquely 


determine  an  integer  I in  the  range  0 < I < 5. 


Finally  we  consider  a second  corollary  to  the  Chinese  Remainder  Theorem: 


Let  I be  a non-negative  integer  in  the  range  0 < I < M.  Let 


ml  < m2  < < "Vj  positive  integers  that  are  relatively  prime  in  pairs.  Let 

s 

s be  the  smallest  integer  such  that  nr  _>  M.  Then  "I"  can  be  uniquely  deter- 
mined from  any  s remainders  from  the  set  tr^ ,r2, . . . ,rN>. 


Let  s and  N be  defined  as  in  the  previous  corollary.  Consider  the  set  of 
remainders  r^,  , ...,r  where  now  F of  these  remainders  are  erased  (i.e.,  are 

missing)  and  T of  them  are  in  error.  Assume  that  2T  + F <_  N - s.  Then  one  can 
uniquely  determine  I from  the  remaining  N-F  unerased  remainders,  T of  which  are 
in  error. 

As  an  example,  let 

m^  = 97,  m2  = 101,  m^  = 103,  m^  = 107  and  m^  = 109. 

Then  if  0 _<  1 < 97-101  = 9797,  s = 2,  N = 5,  and  N-s  = 3.  Then  I can  be  recon- 
structed if 

(a)  one  remainder  was  in  error. 


(b)  two  or  one  remainder  are  erased. 

We  now  apply  these  ideas  to  fault  tolerant  computers. 

Let  1^  and  I 2 be  two  integers  in  the  range  0 <_  1^,  I?  < m = m m7...m^. 
Then  if  1^  and  have  the  remainders 

*1  ^rll ’ r12 * " ’ ' rlN^  ’ 


12  "*■  ^r21,r22’ ' • • ,r2N-*  * 


(I1  ± Vm=  «rll±r2lJm,’  Cr12  ^r22^m  (rlN  * T2*K? 


(I1  * I2)r(Crll  ' T21^m1  ’ (r12  * r22)m2,*‘-  (rlN  ’ TaPn^ 
where  (x)  means  x mod  y. 

This  result  has  been  previously  suggested  for  use  in  a residue  number  system 
computer.  The  advantage  of  such  a computer  is  that  addition  and  multiplication 
can  be  very  fast.  A disadvantage  is  that  it  is  difficult  to  compare  the  magnitude 
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below  where 


°f  Ij  anc^  ^2  from  their  remainders.  We  will  be  interested  in  considering  a 
fault- tolerant  residue  number  system  computer.  Let  mj  < *2  < . . . < be  pair 
wise  prime,  and  let  0 <_  I^»I2  < M = Consider  the  two  systems  shown 

is  a box  that  does  addition,  subtraction  r multiplication 

(perhaps  modulo  some  integer) . 


Computes  I fron 
remainders 


Computes  remainders  from  I 


Computes  remainders  from  I 


f-rom  the  previous  discussion  we  see  that  the  system  below  the  dotted  line  will 
+ 

work  if  T x 's  produce  faulty  outputs  and  F produce  no  outputs  at  all  where 


The  system  above  the  dotted  line  is  the  non-fault-tolerant  version  of  the  system 


produce  errors.  Thus  one  can  use  an  error  detection  code  to  convert  errors  in 
these  boxes  into  erasures.  One  such  code  that  will  detect  any  single  carry  or 
sum  error  in  an  adder  is  to  multiply  each  remainder  by  3.  Thus  we  would  go 


through  the  following  steps. 


a. 

*1  + 

(rn. 

r12,  ..., 

rlN} 

h + 

(r21’ 

r22>  •••> 

r2N^ 

Encode 
+ b . 

Crll» 

rl2’ 

....  r1N) 

* (3rll> 

3r12,  ...,  3r1N) 

(r2r 

r22' 

. . . , r2fJ) 

* (3r21’ 

3r22*  * • * * ^r2N^ 

c. 

lorrn 

(3ril 

+ 3r_J_ 

21  3m^ 

" 31 

+ 

(3t12 

+ 3r00) _ 
22'  3m? 

* a2 

Actual  results 

Addition 

• 

may  contain  errors 

(3rlN 

+ 3rOM) . 

2N 

d.  Calculate  (ap  3 U2)  3 ...  (^3 

If  (3^)3  i 0 replace  a^  by  0 (null  symbol).  Otherwise  let  a^  alone. 
Call  result  (a^a^  . . .a^) 

Decode 

f0  Cap  3 t 0 

a. 

T-  - 0 

e.  Reconstruct  1^  + fromfa^,  a 2>  ...,  a^) 

The  procedure  will  work  if 

(No.  of  0's)  +2  (No.  of  incorrect  ap  <^N-s. 
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We  have  considered  various  schemes  for  coding  for  multi-user  communication 
channels.  One  such  channel  is  discussed  here--the  modulo  2 channel. 

We  consider  a multiple  access  channel  where  two  users  must  separately  en- 
code information  for  a common  channel.  We  assume  word  and  bit  synchronization 
for  the  encoders  but  that  they  are  unaware  of  the  information  to  be  transmitted 
by  the  other  user.  The  channel  to  be  considered  is  a channel  which  accepts  a 
pair  of  binary  symbols  and  produces  as  its  output  a binary  symbol  which  is  the 
modulo  2 summation  of  the  input  symbols. 

Let  and  R2  be  the  rates  of  the  two  users  (in  bits  per  channel  use).  It 
is  well  known  that  the  capacity  region  of  the  modulo  2 channel  is  given  by  the 
equation 

0<R  + R2<1. 

Furthermore  any  point  on  the  line  R^  R9  = 1 can  be  achieved  by  time  sharing 
between  two  modes  of  operation  where  in  each  mode  one  encoder  transmits  uncoded 
data  and  the  other  encoder  transmits  all  zeros. 

An  alternative  scheme  exists  for  achieving  the  rate  pair 
k k 

(Rj,R9)  = 1 - — ) . An  (N,k)  binary  cyclic  code  is  chosen  as  the  code  for 

encoder  1.  This  code  has  generator  polynomial  g(x)  and  parity  check  polynomial 
h(x).  It  is  assumed  that  N is  an  odd  integer  so  that  g(x)  and  h(x)  have  no  com- 
mon factors . 

Let  encoder  1 transmit  a code  word  from  the  (N,k)  binary  code  and  let  en- 
coder 2 transmit  a code  word  from  the  (N,N-k)  dual  code  with  generator  polynomial 
h(x) . Let  Ij(x)  be  the  idempotent  for  the  (N,k)  code  and  let  ^(x)  = 1 + Ij(x) 
be  the  idempotent  for  the  dual  code. 

The  decoder  then  receives  a word  of  the  form 


a(x)g(x)  + b(x)  h(x) . 
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To  obtain  the  code  word  transmitted  by  encoder  1,  it  multiplies  by  I^x), 

N 

modulo  X -1.  To  obtain  the  code  word  transmitted  by  encoder  2,  it  multiplies 
by  I9(x) , modulo  XN-1. 

This  scheme  is  a special  case  of  the  following:  Encoder  1 transmits 
a word  from  an  (N,k)  binary  code.  A coset  table  is  formed  where  the  coset 
leaders  form  an  (N,N-k)  binary  group  code.  The  receiver  receives  a word 
in  the  coset  table,  say  in  the  i^  row  and  column.  It  then  decodes  to 
the  code  word  used  by  encoder  1 and  the  i*'1  code  word  used  by  encoder  2. 

Let  us  consider  a modulo  2 channel  with  errors  as  the  cascade  of  the 
modulo  2 channel  without  errors  and  a binary  symmetric  channel  with  cross- 
over probability  p.  The  capacity  region  for  this  channel  is  given  by  the 
equat ion 

0 _<  + 1*2  <_  1 - h(p) 

where  h(p)  is  the  entropy  function. 

One  approach  to  coding  for  such  a channel  is  to  time  share  between  two 
modes  of  operation  where  in  one  mode  one  encoder  uses  a t error  correcting  code 
(say  a BCH  code)  while  the  other  encoder  sends  all  zeros.  In  the  other  mode 
the  encoders  switch  roles . 

Another  approach  is  as  follows:  Let  g(x)  be  the  generator  polynomial 

N 

of  a binary  cyclic  code  which  corrects  t errors.  Let  X -1  = g(x)h1(x)h2(x) 
where  N is  odd  so  that  g(x) , h^x)  and  h2(x)  have  no  common  factors.  Let  en- 
coder 1 use  code  words  from  a cyclic  code  with  generator  polynomial  g(x)h^(x) 
while  encoder  2 uses  code  words  from  a cycli<  code  with  generator  polynomial 


g(x)h2(x) . 


The  received  word  is  of  the  form 

aUJgCxJhjCx)  + b(x)g(x)h2(x)  + n(x)  = a(x)g(x)  + n(x) 

The  received  word  can  be  decoded  correctly  to  a(x)g(x)  if  no  more  than  t errors 
occurred  in  n(x)  . The  one  can  find  a(x)  and  b(x)  by  using  the  idempotents  of 
the  codes  with  generators  g(x)hj(x)  and  g(x)h2(x). 

The  advantage  of  this  scheme  over  time  sharing  is  that  if  one  source  is 
not  transmitting  (i.e.,  the  encoder  is  transmitting  all  zeros)  the  error  correc- 
tion capability  of  the  code  increases.  For  example,  let  N = 63, 
g ( x)  = m1(x)m3(x)m5(x)m7(x) , h^x)  = mgCx)!^  ^xJm^Cx)  and  h2(x)  = m g(x) 
m23(x)m27(x)m31(x)  where  nt(x)  is  the  minimum  function  of  a1  and  a is  a primi- 
tive element  of  GF(64) . Then  g(x)  is  the  generator  polynomial  of  a 7 error  cor- 
recting code,  g(x)h?(x)  is  the  generator  polynomial  of  an  8 error  correcting 
code.  Thus  if  both  sources  are  transmitting,  4 errors  can  be  corrected  while 
if  only  one  source  is  transmitting  the  code  can  correct  7 or  8 errors. 

Several  of  the  papers  and  presentations  in  Section  III  are  concerned  with 
coding  for  other  models  of  the  multi-user  channel.  In  particular,  codes  were 
found  for  the  case  where  the  users  are  not  in  word  and  bit  synchronism.  Examples 
were  found  where  good  coding  techniques  allowed  one  to  transmit  at  rates  higher 
than  traditional  modulation  schemes  such  as  time  division  or  frequency  division 
multiplexing. 

A decoding  algorithm  was  developed  for  terminated  rate  1/N  convolutional 
codes  which  is  based  upon  an  algebraic  description  of  such  codes.  This  algorithm 
can  be  applied  to  a received  vector  with  components  from  the  same  alphabet  as 
the  transmitted  code  word.  The  algorithm  uses  the  Viterbi  decoding  algorithm 
as  an  essential  step.  However,  it  is  simpler  than  directly  applying  the  Viterbi 
algorithm  in  the  usual  manner. 
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Thc  basic  steps  in  the  algorithm  are  as  follows: 

Step  1.  A code  word  which  is  easily  calculated  from  the  received  vector  is 
subtracted  from  the  received  vector  leaving  a vector  which  ends  in 
a stream  of  zeros. 

Step  2.  The  Viterbi  decoding  algorithm  is  applied  to  the  vector  formed  in 

Step  1 resulting  in  a tentative  code  word.  Since  the  vector  to  be  de- 
coded ends  in  a stream  of  zeros,  a short  cut  can  be  applied  to  the  Vi- 
terbi decoding  algorithm  to  produce  this  code  word. 

Step  3.  The  code  word  used  in  Step  1 is  added  to  the  tentative  code  word  found 
in  Step  2 to  yield  the  maximum  likelihood  code  word. 

The  savings  in  decoding  complexity  occurs  in  Step  2 where  the  short  cut 
is  applied  to  the  Viterbi  algorithm.  In  this  step,  the  full  Viterbi  decoding 
algorithm  is  applied  until  one  comes  to  the  string  of  terminating  zeros.  From 
that  point,  the  algorithm  immediately  produces  the  tentative  code  word.  The  ef- 
ficiency of  this  technique  depends  upon  the  length  of  the  terminating  string  of 
zeros.  We  have  shown  that  the  length  of  this  string  of  zeros  is  no  less  than 
the  number  of  error  free  digits  occurring  at  the  end  of  the  transmission  of  the 
terminated  convolutional  code  word. 
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* 2.  "The  Capacity  Region  of  a Multiple-Access  Discrete  Memory less  Channel  Can 

Increase  with  Feedback,"  IEEE  Transactions  on  Information  Theory,  vol. 
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Communication  Channels,"  a series  of  four  chapters  in  the  book.  Information 
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vited Paper) 
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DQEE  TRANSACTIONS  ON  INTOHMATJON  THEORY,  JANUARY  1973 

The  Cr parity  Region  of  a Multiple- Access  Discrete 
Memoryless  Channel  Can  Increase  with  Feedback 

N.  THOMAS  GAARDER,  member,  jeer,  and  JACK  K.  WOLF. 

YELLOW,  CEB 

Abstract- — The  capacity  of  a single-input  single-output  discrete 
memory  less  channel  is  not  Increased  by  the  use  of  a noiseless  feedback 
Dak.  It  Is  shows,  by  example,  that  this  is  not  the  case  for  a multiple 
access  discrete  memoryless  channel.  That  is,  it  is  shown  that  the  capacity 
recto*  for  sack  a channel  is  enlarged  if  a noiseless  feedback  link  Is 


Introduction 

Shannon  [1  ] proved  that  the  capacity  of  a single-input  single- 
output discrete  memoryless  channel  is  not  increased  even  if  the 
encoder  could  observe  the  output  of  the  channel  via  a noiseless 
delayless  feedback  link.  Recently,  Liao  [2],  and  then,  Slepian 
and  Wolf  [3  ] gave  formulas  for  the  capacity  region  of  a two-input 
single-output  discrete  memoryless  channel  with  independent 
encoding  of  two  source  messages.  After  summarizing  their 
results,  we  evaluate  the  performance  of  a transmission  scheme 
far  this  channel,  which  makes  use  of  noiseless  feedback  links 
from  the  output  to  the  two  encoders.  We  show  that  this  scheme 
yields  a vanishingly  small  error  probability  for  a pair  of  rates 
that  lies  outside  the  capacity  region. 

Capacity  Regions  Without  Feedback 

In  this  section  we  summarize  the  previously  published  results 
concerned  with  the  capacity  region  of  a multiple-access  discrete 
memoryless  channel  without  feedback.  Consider  the  block 
diagram  shown  in  Fig.  1.  Two  sources  are  described  by  a two- 
dimensional  rate  vector  R = (Rl,R2)  with  nonnegative  com- 
ponents. Let  AT  be  a fixed  positive  integer.  Every  A/  time  units, 
the  sources1  produce  a pair  of  statistically  independent  random 
variables  (.U2,U^,  where  U,  is  uniformly  distributed  over  the 
set  of  integers  (1,  2,---,  Af,  - te***!}  4 «F„  i - 1,2.  Hem 
[xl  is  the  smallest  integer  greater  than  or  equal  to  x. 

The  channel  is  described  by  a conditional  probability  dis- 
tribution of  the  output  random  variable  Y (which  takes  values 
/ « 9)  given  the  inputs  - xt  e X2  and  X2  « x2  e X2.  We 
denote  this  conditional  probability  distribution  •Fr)riXl0’  | XjXj). 
The  channel  is  assumed  memoryless  in  the  usual  sense.  That  is, 
the  conditional  probability  distribution  for  A/-vectors  is  equal  to 
the  product  of  the  marginal  conditional  probability  distributions. 
The  encoders  are  a pair  of  deterministic  mappings  from  the  source 
outputs  to  channel  input  Af-vectors.  The  mappings  are  such  that 
if  the  sources  produce  the  pair  (Ux  - i,  U2  - j),  encoder  1 
produces  the  jV- vector  xu  e (3Tl)",  which  depends  only  on  2, 
and  encoder  2 produces  the  //-vector  x2J  e (J’a/r,  which  depends 
only  on  7. 

The  decoder  is  a deterministic  mapping  from  the  i-hanncl 
output  //-vector  / to  the  pair  where  i*  e /„  j • e J2. 

We  denote  the  decoder  outputs  by  the  pair  of  random  variables 
ttWaTi  • 
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Fi*.  1.  MuHiplt-iicMSs  communication  system. 


For  a given  N,  rate  vector  R,  a pair  of  encoders,  and  a decoder, 
we  can  calculate  the  probability  of  error  for  the  code,  namely, 
the  probability 

r.  = Pr  luv  z U.)  or  (V2*  * £/,)]. 

A rate  vector  R is  said  to  be  admissible,  if  for  every  e > 0 there 
exists  a positive  integer  N,  such  that  for  this  N and  R,  encoders 
and  a decoder  exist  for  which  P,  £ e.  The  closure  of  the  set  of 
admissible  rate  regions  is  called  the  rapacity  region.  In  less 
precise  words,  for  every  rate  vector  in  the  capacity  region,  one 
can  create  a communications  system  with  arbitrarily  small 
etTOr  probability,  and  for  every  rate  vector  outside  the  capacity 
region,  one  cannot. 

For  any  three  random  variables  A,  B,  and  C with  joint  prob- 
ability distributions  PABC(a,b,c),  the  conditional  mutual  in- 
formation 1(A ; B | C)  is  defined  as 

I{A-,B\C)  = Z%Z  rAK(a,b,c)  log 

* » c PA\cka  I C)P bic(*  I c) 

(1) 

where  all  logarithms  are  taken  to  the  base  2. 

Furthermore,  let  9 denote  the  class  of  joint  distributions 
Pxlx1Axitx2>y)  that  can  be  written  in  the  form 

= ^rr.x/v  I x„  x1)PJi(xl)PXl{x2)  (2) 

for  all  x,  x2  eX2,  and  yef.  Denote  by  R(Px,x2r)  the 
set  of  vectors  R = (RItR2)  such  that 

Osk,  S I(X1,r\X2)  (3a) 

0 £ R2  £ /(*,;  Y\  Xt)  (3b) 

0 £ Rt  + R2  s HX„X2;  Y)  (3c) 

where  the  mutual  informations  are  computed  using  the  joint 
distribution  (2).  The  capacity  region  is  then  given  as 

convex  hull  (J  R(.Px,x2 r),  (4) 

where  the  union  is  taken  over  all  probability  distributions  in  9. 
Specific  Example 

Consider  a channel  with  alphabets  Xx  = Xx  = (0,1 ),  9 =« 
(0,1,2),  transition  probabilities 

*W/P  | 0,  0)  = Pr]X,Xxd  1 1. 1)  - PTIIlx,( 2 1 0.  1) 

" ^rir,Xj(2 1 1, 0)  — 1 (5) 

and  all  other  transition  probabilities  equal  to  zero.  Such  a 
channel  is  deterministic  in  that  every  pair  of  inputs  always  yields 


the  same  output.  Note  that  the  output  does  not  uniqu 
mine  the  input.  Specifically,  the  output  symbol  2 to 
resulted  from  either  Xt  = 0,  X2  = 1 or  X2  = 1,  X2 
will  call  the  output  symbol  2 an  erasure.  Let 

PXl(0)  = a and  PXl( 0)  - fi 

Then  the  mutual  informations  that  appear  in  (3)  can  be  e 
to  yield 

l{X± ; Y | X2)  = h(x)  = - a log  b - (1  - a)  log  (1 

I(X. ~fi  log  fi  - (1  - fi)  log  (1 

/(*>;* 2;  Y)  = afi  log  afi  - [a(l  - fi)  + fid  - a) 
• log  [a(l  - fi)  + fi(l  - a)] 

- (1  - aXl  - fi)  log  (1  - aXl  - 

All  three  mutual  informations  ate  simultaneously  ma 
when  <x  = fi  ~ The  capacity  region,  as  given  by  (4) 
is  then 

C = {(Ri,R2):  0 £ *,  < 1.  0 £ Rx  £ 1, 

0 £ *1  + R2  £ 1 

and  is  depicted  in  Fig.  2. 

Transmission  with  Feedback 

For  the  previous  example  we  give  a coding  technic 
can  be  used  when  noiseless  feedback  links  are  available  f 
output  to  the  two  encoders.  We  will  show  that  using  tl 
nique,  we  can  achieve  a vanishingly  small  error  proba 
rates 

A,  = Rx  - 0.76. 

We  note  that  the  pair  of  rates  (A,.*,)  = (0.76,0.76)  falls 
the  capacity  region  shown  in  Fig.  2. 

We  assume  that  each  encoder  observes  the  sequence  ol 
symbols  from  the  multiple-acoess  channel.  The  /th  outpc 
first  encoder  can  then  depend  upon  the  first  (/  - 1)  qu 
the  channel  as  well  as  the  value  of  i.  Similarly,  the  /th  i 
the  second  encoder  can  depend  upon  the  first  (/  — 1) 
of  the  channel  as  well  as  the  value  of  j. 

Let  N be  a large  integer  such  that  (0.76)N  is  equa 
integer  X.  Then  each  encoder  must  transmit  one  of  A#  = 2* 
2*  messages  in  N uses  of  the  channel.  Each  encoder  first  tr 
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*1 

Fig.  2.  Capacity  region  of  channel  used  in  example. 

its  respective  message  by  sending  K uncoded  binary  digits,  which 
if  received  correctly  would  identify  the  messages. 

Let  us  consider  the  sequence  of  output  symbols  corresponding 
to  this  input.  From  (5),  we  see  that  the  input  is  known  exactly 
whenever  the  output  is  zero  or  one.  However,  for  those  positions 
where  the  output  symbol  is  two,  the  decoder  knows  only  that  the 
input  symbols  were  complements  of  each  other.  Let  Q be  the 
number  of  positions  for  which  the  output  symbol  was  two. 
Since  both  encoders  observe  the  output  symbols  via  a noiseless 
feedback  link,  the  encoders  know  the  positions  where  two 
occurred  at  the  output  of  the  channel  and  also  know  the  other 
input  sequence  exactly.  Both  encoders  can  then  cooperate  to 
retransmit  those  symbols  from  the  first  encoder  corresponding 
to  the  received  erasures.  The  second  encoder’s  output  need  not 
be  sent  since  it  is  known  to  be  the  complement  of  this  sequence. 

We  use  the  remaining  N — K uses  of  the  channel  to  retransmit 
the  output  of  the  first  encoder  corresponding  to  the  erasures. 
Since  the  encoders  can  cooperate  completely  in  this  endeavor, 
they  can  send  different  message  patterns  in  these  (N  — K) 
uses  of  the  channel.  This  transmission  is  accomplished  by  using 
the  three  input  pairs  (0,0),  (0,1),  and  (1,1),  which  are  received 
error  free  at  the  receiver. 

If  2°  s 3*-r,  the  decoder  will  be  able  to  reconstruct  the  two 
messages  without  error.  If  2Q  > 3*~K  we  will  declare  an  error. 
Although  this  scheme  could  be  improved  upon,  we  now  show 
that  the  probability  of  error  can  be  made  as  small  as  desired  by 
choosing  jV  large  enough. 

The  probability  of  error  is  then 

Pr  [Q  > log ^ = N(0.2A)  log2  3].  (10) 

However,  Q is  a random  variable  with  mean 

2 - £ = 0.38//  (11) 
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Discussion 

It  has  been  shown  that  the  capacity  region  of  a multiple-access 
discrete  memoryless  channel  can  be  increased  by  feedback.  An 
unsolved  problem  is  to  determine  the  capacity  region  when 
feedback  is  available.  One  obvious  outer  bound  for  this  region 
is  to  replace  the  union  in  (4)  by  the  union  over  all  joint  prob- 
abilities that  can  be  written  in  the  form 

P x,x,r(xir*2..v)  =*  P nx.x/T'  I *i>  xi)P x,x,(xi<xi)-  04) 

It  is  conjectured,  however,  that  this  is  only  a bound  and  that,  in 
general,  not  all  rates  in  this  region  are  admissible. 
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and  variance 


a3  = - =*  0.19M 


Pr  [Q  > (0.24)  (log2  3)N]  < Pr  [|Q  - Q|  > (0.00039 )N] 

S **  _ (019) 

[(0.00039)  A/]*  (0.00039)*  Af* 

(13) 

which  can  be  made  as  small  as  desired  by  choosing  N large 
enough. 


sequence,  it  nay  have  * very  atypical  composition.  For  example  let 


It  will  now  t>'  shown  chat  the  "dots"  are  unlCornly  distributed  in 


,Y)  Sequences 


AEP  Property  in  Information  Theory  II 


Putting  the  source  coding  end  channel  coding  theorea  together, 


Increased  by  the  use  of  a noiseless  feedback  link.  of  error  can  be  made  vanishingly  small. 


the  other  encoder  sees  a binary  erasure  channel  with  erasure  probability 
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THF.  USE  OK  CONSTANT  WEIGHT  BLOCK  CODES  FOR  THE  UNDERWATER  CHANNEL 


J.  Pi<*por,  R.  Reed,  J.  Proakls,  J.  Wolf 


ABSTRACT 

The  use  of  codji)|’.  to  improve  data  communica- 
tion over  the  underwater  acoustic  channel  is 
discussed.  It  is  shown  that  for  this  application, 
constant  weight  block  cedes  ate  particularly  appro- 
priate. Methods  of  forming  such  cedes  are  present- 
ed. A representative  example  show,  that  the 
described  use  of  constant  weight  cedes  results  in  a 
performance  increase  of  several  oB  relative  to 
conventional  uncoded  diversity  systems. 


The  use  of  coding  in  data  com.iur.ications  sys- 
tems is  today  widespread.  This  may  range  from  the 
use  of  a simple  parity  check  for  data  storage  and 
transmission  on  magnetic  tape  to  very  sophisticated 
block  or  convolutional  coding  techniques  for  satel- 
lite communications  systems.  However,  coding 
remains  a relatively  unused  technique  in  under- 
water acoustic  romMuni rat  ions.  In  this  paper,  we 
briefly  discuss  the  application  of  coding,  in 
particular  constant  weight  block  codes,  to  the 
undirwatrt  acoustic  channel. 

The  underwater  accustic  channel  poses  particu- 
larly difficult  problems  to  the  conmuni cat  ions 
engineer.  Although  there  ire  many  aspects  of  it 
which  must  be  considered  in  the  design  of  a communi- 
cation system,  there  are  two  that  are  of  immediate 
interest  to  us  here.  The  first  of  these  is  the 
fact  that  multiple  propagation  paths  exist  that  give 
rise  to  multipath  fading.  Thus,  in  order  to 
achieve  reliable  communication,  a high  order  of 
diversity  is  required.  The  second  aspect  is  that, 
due  to  motion  of  the  surface  and/or  the  transmit  or 
receive  platforms,  the  individual  path  lengths  are 
time  varying.  This  imposes  a random  phase  modula- 
tion upon  any  received  waveform,  the  magnitude  of 
which  is  normally  considered  to  be  great  enough  to 
preclude  any  phase  sensitive  communications  techni- 
ques (e.g. , PSK) . Thus  the  basic  techniques 
commonly  propo-,  i for  underwater  data  communications 
are  on/off  keying  (00K),  frequency  shift  keying 
(FSK) , or  M-ary  FSK.  These  are  used  to  folia  multi- 
tone  • uve forms , with  several  orders  01"  diversity 
achieved  by  redundant  transmission. 


By  the  use  of  coding,  a more  efficient  utili- 
zation of  the  available  t ime/f requency  signaling 
space  is  made.  This  lcjds  to  an  improved  level  of 
performance  relative  to  conventional  uneoded 
diversity  systems. 

We  consider  a data  communications  system  such 
as  in  Figure  1.  Here,  the  data  to  be  transmitted 
is  separated  into  blocks,  each  of  k bits.  Each 
block  is  then  encoded  into  a block  of  n bits  using 
an  (n,  k)  block  code.  These  n bits  arc  assigned  to 
n cells  within  the  available  signaling  space.  If  a 
bit  is  a one,  then  a tone  pulse  is  transmitted 
within  that  cell;  if  the  bit  is  a zero,  no  energy 
is  transmitted.  For  simplicity,  we  picture  these 
n cells  as  all  being  located  within  and  filling  cne 
time  slice.  Then,  cort expending  to  one  code  word, 
a multitone  on/off  keyed  (MTOOK)  waveform  is  gener- 
ated. In  practice,  it  may  be  necessary  to  scatter 
the  cells  of  one  code  word  in  both  rime  and 
frequency  in  order  to  obtain  independent  fading — 
our  technique  is  readily  extended  to  this  case. 

To  recover  the  information  bits  corresponding 
to  a received  waveform,  maximum  likelihood  decoding 
Is  employed.  The  squared  magnitudes  of  the 
responses  of  the  matched  filters  corresponding  to 
the  n cells  of  a code  word  are  formed.  A set  of 
decision  variables  is  ihon  formed,  one  for  each 
hypothesis,  where  each  hypothesis  is  a word  In  the 
code  set.  The  decision  variables  are  formed  b' 
taking  the  dot  product  of  the  code  word  with  this 
vector  of  squared  rnagni tudos . That  is,  the 
received  power  levels  in  the  cells  corresponding  to 
ones  in  the  hypothesized  code  word  are  added  to- 
gether; the  other  responses  are  ignored.  The  code 
word  corresponding  ro  the  largest  decision  variable 
is  chosen  and  the  decoded  data  bits  determined 
therefrom.  We  note  that  the  complexity  of  this 
procedure  increases  with  the  number  of  words  in  the 
code.  However,  with  the  rapid  advances  in  computer 
technology,  maximum  likelihood  decoding  of  roder. :e- 
ly  large  (k  i 10)  codes  is  al.rec.dy  feasible  in  cany 
real  time  applications. 

In  general,  such  a procedure  requires  a nor- 
malization by  the  weight  (the  number  of  one  bits) 
of  each  hyporhesi z.i_d  code  word.  However,  If  a 
constant  weight  code  is  used,  this  normalization  is 
not  required.  This  simplification  of  the  receiver 
processing  is  one  .id vantage  of  using  constant  v,  . 
codes . Sim.- lari" , by  using  a constant  veicht  ccc  . 
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Figurc  1 Model  of  Coonnunicat  ions  System 


all  waveforms  have  the  same  average  signal  power, 
which  tfiir.plifi.es  the  design  of  the  transmitter. 
However v the  most  significant  advantage  of  constant 
weight  codes  in  this  application  is  that  they 
guarantee  a readily  determined  effective  order  of 
diversity . 

To  see  this,  consider  first  the  use  of  an 
arbitrary  code.  Assume  that  code  word  x was  trans- 
mitted as  an  MTOOK  wave form.  The  Maximum  likeli- 
hood receiver  effectively  compares  the  hypothesis 
x against  all  other  possible  code  words.  In  the 
decision  between  x and  some  oiher  hypothesized  code 
word  y,  the  order  of  diversity  is  given  bv  the 
number  of  bit  positions  in  which  x contains  a one 
(signal  energy)  and  y contains  a zero  (no  signal 
energy).  The  .'ficctive  order  of  diversity  in  the 
entire  decoding  process  is  the  minimum  obtained 
over  the  set  of  all  such  individual  decisions. 

This  will  change  as  the  actual  code  word  x charges 
and  the  error  into  will  accordingly  be  different  for 
different  code  woids.  Intuitively,  this  represents 
a pooily  designed  system  in  that  the  error  rate  is 
dependent  upon  the  actual  data,  i.e.,  the  communi- 
cations system  represents  a non- symmetric  channel 
in  the  information  theoretic  sense. 

However,  for  a constant  vcipht  code,  this  pro- 
blem vanishes.  Given  any  two  code  words  x and  v, 
the  number  of  bit  positions  in  which  >:  contains  a 
one  and  y a zero  is  the  same  as  the  number  of  bit 
positions  in  which  x contains  a zero  and  y a one. 
Then,  the  effective  order  of  diversity  is  readily 
shown  to  be  the  sar.e  for  al!  code  words  and  is 
further  seen  to  be  simply  one  half  of  the  minimum 
distance  of  the  code.  Thus,  net  o:  lv  does  a 
constant  effective  eider  01  d1,rrrsftv  exist  foi  the 
communication,  procedure  when  a constant  wight 
block  code  if*  used,  but  al  u thf'  order  and  henre 
the  sysren  performance  can  be  /oa'.'.v  deter,  ined. 

Although  the  construction  ol  block  Cf' h s with 
good  distance  properties  has  been  veil  . tudied,  the 
construction  of  constant  weight  block  codes  has 
received  little  attention.  We  not*  ih.it  any 


constant  weight  code  must  be  nonlineai  as  it  cannot 
contain  the  all  zero  (identity)  code  word.  Much  of 
coding  theory  has  been  directed  towards  the  study 
of  linear  codes;  hoie,  ve  briefly  present  a few 
methods  of  constructing  constant  weight  codes  that 
allow  one  to  take  advantage  of  this  prior  work. 

In  general,  an  arbitrary  block  code  is  first 
selected — usually  on  the  basis  of  some  desired  pro- 
perty, such  as  minimum  distance,  weight  distri- 
bution or  word  length.  From  this  initial  code,  a 
constant  weight  code  is  then  formed. 

One  method  of  doing  thin  involves  a nonlinear- 
traps  format  ion.  In  each  word  of  the  original  code, 
one  binary  sequence  is  substituted  for  every  occur- 
rence of  a zero  and  a different  sequence  is  substi- 
tuted for  every  occurrence  of  a one.  11. c two  se- 
quences arc  ot  tl»e  sa^e  length  and  weight.  The 
simplest  example  consists  of  replacing  every  zero 
with  the  pair  (0,  1)  and  every  or.e  with  the  pair 
(1,  0).  The  word  length  and  minimum  distance  of 
the  resulting  code  will  in  this  case  be  doubled; 
the  weight  will  be  one  half  the  new  word  length. 

The  total  number  of  rode  words  and  hence  the  infor- 
mation content  of  a code  word  is  unchanged 

A second  method  is  expurgation.  In  this 
method,  from  the  initial  code  a subset  is  selected 
consisting  of  all  words  of  a certain  weight. 

Several  different  constant  weight  codes  can  be 
obtained  froi.  one  initial  code  by  varying  the 
chosen  weight.  Using  this  method,  the  word  size  is 
unchanged,  the  number  of  words  in  the  code  is  de- 
creased, and  the  new  minimum  distance  is  at  least 
the  original  value. 

Another  method  of  forming  a constant  weight 
avp  that  is  som  tit .es  described  involves  the  for- 
mation of  a Harbour*  code  (kef.  2).  Those  cct  . - 
art  rr.od  iron  «.  had.mard  matrix  and  have  tin 
d«  ; i robin  prop*,  i t .'.  ; that  all  code  voids  » nc*.- 
(the  all-zero  and  all-one  words)  ere  of  constant 
Vvirpii  ami  that  the  C-ininum  distance  is  one  ..  If 
the  word  length.  However,  a Hadamard  code  is 
actually  n linear  cone  and  the  formation  ol  a 


. 
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constant  weight  code  from  i Hadamurd  code  is  simply 
a special  case  of  the  expurgation  technique. 

Yet  another  method  of  forming  a constant 
weight  block  code  involves  the  concatenation  of  two 
code;;,  one  of  which  is  non-binary.  This  results  in 
very  large  codes  which  can  be  efficiently  decoded. 
Details  arc  presented  in  References  3 and  4. 

As  an  example  oi  some  of  these  techniques,  we 
use  the  familiar  (2*V,  12)  extended  Colay  code  as 
an  initial  code.  Parameters,  of  const  ant  v.-jght 
codes  formed  1 row  this  code  by  non!  intMt*  t runs; for- 
mation and  by  -xpurgation  are  given  below. 

In  this  table,  we  list  two  n.irnrvters  that  are 
important  in  describing  the  performance  of  the  re- 
sulting communication.';  system.  The  bandwidth 
expansion  factor  is  defined  a?,  the  ratio  of  the 
number  of  bits  in  a code  word  ro  the  number  of 
information  bits  conveyed  by  a code  word  and  is 
hence  a measure  of  the  amount  by  which  the  use  of 
the  code  increases  the  bandwidth  beyond  the  mini- 
mal amount  required  to  achieve  the  same  data  trans- 
fer rate  (J.r.,  a single  order  of  diversity).  We 
also  show  the  effective  order  of  diversity  of  the 
coded  system,  noting  that  this  value  is  consistent- 
ly greater  than  tin*  bandwidth  expansion  factor.  We 
observe  that  the  m^re  traditional  use  of  purely 
redundant  data  transmission  (no  coding)  with  00K 
obtains  an  order  of  diversity  equal  to  the  band- 
width expansion  factor;  with  FSK,  the  level  of 
diversity  is  only  half  this  value.  Thus,  the  use 
of  block  coding  results  in  more  diversity  and  hence 
better  performance  for  the  same  bandwidth  expansion; 
alternatively,  to  obtain  the  sunn  level  of  diver- 
sity, loss  bandwidth  is  required  in  the  coded 
system. 


To  quantify  these  effects,  we  consider  Rev*] 
communications  systems,  each  of  which  require  a 
bandwidth  four  times  the  information  transfer  rat 
First,  ve  consider  systems  with  pure  redundancy. 
For  a bandwidth  expansion  factor  of  4,  there  an* 
three  possible  conventional  system,  applicable  tc 
the  underwater  channel . These  are  00K  with  four 
levels  of  diversity,  FSK  with  two  levels  of  diver 
sity  and  4-ary  FSK  with  two  levels  of  diversity. 

Wo  next  consider  two  systems  employing  HTOOK  t ran 
misrion  of  constant  weight  black  encoded  data.  Oi 
block  code  considered  it,  the  (48.  12)  constant 
weight  code  obtained  from  the  Colay  code  (effect ii 
order  of  diversity  = S) ; the  other  is  a Hadamarci 
code  of  word  length  20  (effective  order  of  diver- 
sity - 5).  Performance  of  these  systems  was  cal- 
culated under  the  assumptions  of  independent  fadin 
and  equal  signal- to-noi* c ratio  per  signaling  cell 
The  error  rates,  as  measured  in  equivalent  bit 
error  probability  versus  signal- to-noise  ratio  per 
information  bit,  are  shown  in  Figure  2.  As  can  be 
seen,  at  error  rates  of  practical  interest  (e.g., 
Pb  = 10“')  the  coded  systems  required  3 to  6 dft 
less  signal-to-noise  ratio  than  do  the  systems 
using  pure  redundancy. 

In  conclusion,  the  use  of  block  codes  with 
multitone  on/off  keying  offers  a convenient  method 
of  constructing  waveforms  appropriate  for  data 
transmission  over  the  underwater  acoustic  channel. 
The  use  of  a constant  weight  code  simplifies  the 
receiver  processing,  results  in  constant  average 
energy  waveforms  and  guarantees  a readily  deter- 
mined effective  level  of  diversity.  The  perfor- 
mance of  the  resulting  system  is  significantly 
superior  to  that  obtained  with  more  conventional 
diversity  techniques. 


EXAMPLES  OF  CONSTANT  V.TI  GUT  CPI >F S 


1 

• 

Original 
Extended  Golay 

0^  01 

1-*  10 

Expurgated 
w=  8 

Expurgated 

v=l2 

Number  of  words  in  code 

4096 

4096 

75? 

2576 

Word  size 

24 

48 

2A 

24 

Information  content 

12 

12 

9 

11 

Minimum  distance 

8 

16 

8 

8 

Weight 

variable 

24 

8 

12 

Bandwidth  expansion  | 2.0 

4.0 

2.7 

2.2 

Lf  fee  five  diversity 

- ' 1 
‘ 

8 

4 | 

4 
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Figure  2 Performance  of  Variou.:  Comrmni  cat  ions 
Systems;  Bandwidth  Expansion 
Factor  - 4 
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Efficient  Maximum  Likelihood  Decoding  of  Linear  Block  Codes 

Using  a Trellis 

JACK  K.  WOLF,  FELLOW,  IEEE 


.Abstract — It  is  shown  that  soft  decision  maximum  likelihood 
decoding  of  any  (n,b ) linear  block  code  over  GF(q  ) can  be  accom- 
plished using  the  Viterbi  algorithm  applied  to  a trellis  with  no  more 
than  states.  For  cyclic  codes,  the  trellis  is  periodic.  When  this 

technique  is  applied  to  the  decoding  of  product  codes,  the  number 
of  states  in  the  treilis  can  be  much  fewer  than  qn~k.  For  a binary 
(n,n  - 1)  single  parity  check  code,  the  Viterbi  algorithm  is  equiv- 
alent to  the  Wagner  decoding  algorithm. 

I.  Introduction 

TWO  DISTINCT  error-control  techniques  exist 
for  the  reliable  transmission  of  digital  data  over  noisy 
communications  channels:  block  codes  and  convolutional 
ctxles.  There  are  many  similarities  and  differences  between 
these  techniques.  One  important  difference  is  that  much 
more  efficient  algorithms  have  been  found  for  using 
channel  measurement  information  (i.e.,  soft  decisions)  in 
the  decoding  of  convolutional  codes  than  in  the  decoding 
of  block  codes. 

This  paper  is  concerned  with  the  maximum  likelihood 
decoding  of  linear  block  codes  using  channel  measure- 
ment information.  Bv  maximum  likelihood  decoding,  we 
mean  a decoding  algorithm  which  results  in  the  minimum 
probability  of  decoding  to  an  incorrect  code  word  when  the 
a prion  probabilities  of  all  the  code  words  are  equal.  By 
using  channel  measurement  information,  we  mean  that 
the  decoding  algorithm  can  utilize  real  numbers  le.g.,  the 
analog  outputs  of  filters  matched  to  the  signals)  associated 
with  each  component  of  the  code  word.  The  decoding  al- 
gorithm will  be  of  particular  use  in  decoding  high-rate 
codes,  since  the  complexity  of  the  algorithm  will  be 
upper-bounded  by  a function  of  the  number  of  parity 
symbols. 

The  following  results  are  demonstrated  in  this  paper. 

1 )  Soft  decision,  maximum  likelihood  decoding  of  any 
(n,fc)  linear  block  code  over  GF(q ) can  be  accomplished 
using  the  Viterbi  [l]  algorithm  applied  to  a trellis  having 
no  more  than  q{n~h)  states. 

2)  If  the  linear  code  is  cyclic,  the  trellis  is  periodic. 

3)  If  the  linear  code  is  a product  code,  the  number  of 
states  required  can  be  considerably  less  than 

4)  For  a binary  (n,n  — 1)  single-parity  check  code,  the 
Viterbi  algorithm  applied  to  the  trellis  is  equivalent  to 
Wagner  decoding  (2). 

Manuscript  received  Inly  26. 1976;  revised  April  29.  1977.  This  research 
was  supported  by  the  Air  Force  Office  of . Scientific  Research,  Air  Force 
Systems  Command,  USAF,  under  Grant  AFOSR-74-2601. 

The  author  is  with  the  Department  of  Electrical  and  Computer  Engi- 
neering. University  of  Massachusetts,  Amherst.  MA  01003. 


Some,  if  not  all,  of  these  results  can  be  deduced  from 
previously  published  papers.  However,  the  practical  sig- 
nificance of  our  technique  makes  it  appear  worthwhile  to 
present  them  together  here.  A comparison  between  the 
decoding  complexity  of  our  technique  and  that  of  the  usual 
word  correlation  decoding  should  amplify  this  point. 
Consider  the  maximum  likelihood  decoding  of  a (31,26) 
binary  Hamming  code  using  channel  measurement  in- 
formation. If  word  correlation  decoding  is  utilized,  the 
received  data  would  be  compared  with  each  of  the  226  code 
words.  Using  the  Viterbi  algorithm,  a trellis  with  only  2s 
states  is  utilized.  Both  the  word  correlation  decoder  and 
the  Viterbi  decoder  will  decode  to  the  same  codeword  and 
thus  give  identical  performance.  In  this  case,  the  advantage 
of  instrumenting  a Viterbi  decoder  rather  than  a word- 
correlation  decoder  should  be  obvious. 

The  concepts  presented  in  this  paper  have  some  simi- 
larity to  the  work  of  Bahl  et  al.  [3]  and  that  of  Hartmann 
and  Rudolph  [4].  However,  in  both  of  these  papers,  the 
authors  were  concerned  with  minimizing  the  probability 
of  symbol  error  rather  than  the  probability  of  word  error. 
Miyakawa  and  Kaneko  [5]  have  presented  a different  de- 
coding algorithm  for  maximum  likelihood  decoding  of 
linear  codes  using  channel  measurement  information. 
Their  algorithm  appears  to  require  a decoder  with  greater 
complexity  than  that  d'scussed  here.  For  example,  for  the 
(31,26)  binary  Hamming  code,  their  decoder  considers 
3684  error  patterns.  Chase  [6]  and  others  [7]— [10]  have 
given  suboptimum  decoding  algorithms  which  are  rela- 
tively simple  to  instrument,  but  which  do  not  always 
achieve  maximum  likelihood  .decoding. 

The  technique  described  here  has  been  applied  to  a 
concatenated  coding  scheme  where  constant  weight  binary 
block  codes  are  transmitted  over  a fading  channel.  The 
details  will  be  discussed  elsewhere  [11]. 

We  first  give  a general  formulation  which  holds  for  all 
linear  block  codes.  We  then  consider  the  case  of  cyclic 
linear  codes.  Finally,  we  consider  product  codes  and  show 
that,  for  such  codes,  the  number  of  states  in  the  trellis  is 
greatly  reduced  over  what  we  might  expect  from  the 
treatment  of  the  general  problem. 

II.  Linear  Codes  Over  GF(q) 

Denote  the  elements  of  the  finite  field  GF{q ) as  a,,j  - 
0,1,2,  •••,((?  — 1).  Consider  a linear  ( n.k ) code  over  GF(q) 
with  parity  check  matrix  H.  Denote  the  ith  column  of  H 
as  A,,  so  that  h„  i = 1,2,  • • • ,n  are  (n  — fel-tuples  with  el- 
ements from  GF(q).  The  codewords  in  the  code  are  all  the 
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n-tuples  Xwith  elements  from  GF(q),  such  that  H. X = 0. 
Here  0 is  the  all-zero  (n  — k )-tuple. 

We  now  define  a trellis  for  this  code.  A trellis  is  a par- 
ticular collection  of  nodes  (or  states)  interconnected  by 
unidirectional  edges.  The  nodes  will  be  grouped  into  sets 
indexed  by  a parameter  k,k  - 0,1,2,  • • • ,n.  A node  indexed 
by  a particular  value  of  k will  be  said  to  be  at  depth  k. 
Edges  will  be  drawn  between  certain  pairs  of  nodes  at 
depth  k and  at  depth  (k  + 1),  for  k = 0,1,  • ■ • ,(n  - 1),  with 
the  direction  of  the  edge  going  from  the  node  at  depth  k 
to  the  node  at  depth  (k  + 1).  At  any  depth  k,  there  will  be 
at  most  (j1'1-*'  nodes.  The  nodes  at  depth  k will  be  iden- 
tified by  (n  - (e  (-tuples,  s,  (k),  with  elements  from  GF(q ) 
for  certain  values  oft.  All  of  the  qtn~>,)  ( n — k)- tuples  are 
assumed  to  be  ordered  from  0 to  — 1,  with  0 refer- 

ring to  the  all  zero  (n  — fc)-tuple.  s,{k)  is  to  be  interpreted 
as  the  ith  (n  - A)-tuple  in  this  list.  Since  not  all  of  the  (n 
- A)-tuples  may  correspond  to  nodes  at.  a depth  k,  we  let 
/*  be  the  subset  of  the  integers  {0,1,  • • • ,(qin~k)  - 1)|  cor- 
responding to  those  (n  - A (-tuples  which  correspond  to 
nodes  at  depth  k . The  edges  are  labeled  in  a manner  to  be 
described  below. 

A trellis  is  a compact  method  of  cataloging  all  of  the  q* 
codewords  of  a linear  code.  Each  distinct  codeword  cor- 
responds to  a distinct  path  in  the  trellis.  In  order  to  see  how 
this  correspondence  occurs,  we  describe  how  to  construct 
the  trellis  for  a particular  code. 

1)  At  depth  k = 0,  the  trellis  contains  only  one  node, 
namely  so(0),  the  all-zero  ( n - A (-tuple. 

2)  For  each  k = 0,1,  • • • ,(n  - 1),  the  collection  of  nodes 
at  depth  ( k + 1)  is  obtained  from  the  collection  of  nodes 
at  depth  k by  the  formula 

si(k  4-  1)  = s,{k)  + otjhfe+i, 

for  all  i e h,  and  j = 0,1,  •••,((?  — 1). 

For  each  i in  /*,  connecting  lines  are  drawn  between  the 
node  s,(k)  and  q nodes  formed  from  it  at  depth  (k  + 1) 
using  the  above  formula.  Each  such  line  is  labeled  by 
the  particular  value  of  a s which  formed  S[(h  + 1)  from 
s,(H 

3)  We  remove  any  nodes  that  do  not  have  a path  to  the 
all-zero  state  at  depth  n,  and  we  remove  all  lines  drawn  to 
these  expurgated  nodes. 

There  is  a one-to-one  correspondence  between  each 
codeword  in  the  code  and  the  sequence  of  at  on  any  path 
from  the  all-zero  node  at  depth  0 to  the  all-zero  node  at 
depth  n.  There  are  qk  distinct  paths  through  this  trellis, 
and  each  such  path  corresponds  to  a unique  codeword. 

For  h * 0,  we  have  only  one  state  or  node  in  the  trellis: 
So(0).  For  k * 1,  we  have  q states:  namely,  <ij(h\),  j = 
0,1,  •••,(<?  - 1).  For  an  arbitrary  depth  k,  1 5 h < n,  we 
have  the  states  a;iA\  + al2hi  + • • • + where  + is  the 

addition  operator  defined  for  vectors  with  components 
from  the  field  GF\q).  Note  that  the  number  of  states  at  any 
depth  cannot  exceed  the  number  of  distinct  (n  - 

A (-tuples  with  elements  from  GFiq). 

We  illustrate  the  construction  of  a trellis  for  the  binary 


Fig.  1.  Trellis  for  binary  \5.3I  code  before  expurgation 


Fig  2.  Expurgated  trellis  for  (5,3)  code. 


(5,3)  code  with  parity  check  matrix 
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Following  the  procedure  outlined  above,  we  find  the  trellis 
before  the  expurgation  of  nodes  to  be  as  shown  in  Fig.  1 
and,  after  expurgation  of  nodes,  to  be  as  shown  in  Fig.  2. 

For  decoding,  it  is  not  necessary  to  expurgate  nodes  in 
the  trellis,  as  the  Viterbi  algorithm  can  just  as  easily  be 
used  in  the  unexpurgated  trellis. 


III.  Viterbi  Decoding  Using  a Trellis 

Since  the  Viterbi  algorithm  is  now  a well-understood 
decoding  procedure,  only  a brief  description  will  be  pre- 
sented here.  We  assume  that  decoding  is  to  be  accom- 
plished based  upon  the  received  n -tuple  c with  real  com- 
ponents C),C2,  •••c„.  We  assume  no  intersymbol  inter- 
ference so  that  the ;th  component  of  c depends  only  on  the 
; th  component  of  the  transmitted  code  word,  xr  We  fur- 
ther assume  that  the  noise  contributions  in  each  of  these 
components  are  described  by  statistically  independent 
random  variables  N„  with  probability  density  functions 
fN,(  ).  < - 1.2,  • • • ,n.  Then  the  logarithm  of  the  likelihood 
ratio  of  the  data,  given  the  transmitted  codeword,  is  of  the 
form 

log/c|jr(c|x)  =■  £ log  (/.Vl(c,  |x,)j 

i«i 

■ Z 2<(c..*,)  4 Z(A”1 

i-l 

For  a given  received  data  sequence  c,  a maximum  likeli- 
hood decoder  finds  the  codeword  X which  gives  the  largest 
value  of  Z(X).  A brute  force  approach  would  suggest  trying 
all  qk  possible  codewords. 

The  Viterbi  algorithm  is  a recursive  algorithm  whereby 
many  codewords  can  be  discarded  from  consideration  in 
finding  that  X which  maximizes  Z(X).  Referring  to  the 
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trellis  introduced  in  the  previous  section,  we  can  state  the 
procedure  as  follows.  For  each  node  s,(k)  at  depth  k,  assign 
a real  number  V(s,tfe))  in  accordance  with  the  following 
rules. 

1)  For  k - 0,  set  V(so(0))  = 0. 

2)  For  all  l € /*+  t,  form  V(si(k  + 1))  from  V( s,  (/s ) ) in 
the  following  manner  (k  = 0,1,2,  •••(«—  1)): 

V(si(k+  1))  = max  [V(s,(fc))  + z*+l(c*+i,a,)J, 

<*)€  GFiq ) 
ie  Pkjclk 

where  P*, ; is  the  subset  of  /*  consisting  of  the  set  of  indices 
i such  that,  for  some  aj  e GF(q),  S((k  + 1)  = s, (&)  + 

3)  Retain  only  that  path  to  V{si(k  + 1))  from  that  s,(fe) 
which  gave  the  maximum  in  the  above  formula. 

4)  At  k = n,  the  sequence  of  ay  on  the  single  remaining 
path  from  the  all-zero  state  at  depth  0 to  the  all-zero  state 
at  depth  n corresponds  to  the  codeword  X which  maxi- 
mizes Z{X). 

It  should  be  noted  that  this  algorithm  can  be  used  with 
the  unexpurgated  trellis  or  the  expurgated  trellis. 

IV.  Decoding  of  Cyclic  Codes  Over  GF(q) 

For  cyclic  codes  over  GF(q),  an  alternative  (but  equiv- 
alent) method  of  forming  the  trellis  is  to  associate  the 
nodes  with  the  q(n"*’  states  of  the  (n  - k)  stage  shift 
register  used  for  encoding  and  decoding.  For  an  (n,k)  cyclic 
code  over  GF(q ) with  generator  polynomial g(x)  = go  + g\x 
+ • • • + gr xr,  gi  e GF(q),  r = n-k,  one  form  of  the  encoder 
is  as  shown  in  Fig.  3. 

The  square  boxes  are  storage  devices  for  elements  from 
GF{q),  the  circles  enclosing  + signs  are  adders  for  elements 
from  GF{q),  and  the  circles  enclosing  g/s  are  multipliers 
for  elements  from  GF(q).  (gr-t  is  the  multiplicative  in- 
verse of  gr. ) We  enter  the  k message  digits  at  the  input  with 
the  switches  S i and  S 2 in  position  1.  We  then  enter  (n  — 
k ) 0’s  at  the  input  with  the  switches  S i and  S2  in  position 
2.  The  output  is  the  codeword. 

The  trellis  for  this  code  is  built  by  tracing  the  possible 
states  of  the  storage  devices  for  all  possible  inputs.  Since 
there  are  r storage  devices  and  each  device  can  contain  at 
most  q different  elements,  there  will  be  at  most  qr  states 
in  the  trellis  at  any  depth.  For  a reasonable  code  in  which 
all  encoder  states  are  utilized,  the  number  of  trellis  states 
at  depth  j in  the  expurgated  trellis  is  given  by  the  formu- 
la 

number  of  states  in  trellis  at  depth  j 

qi,  j - 1,2,  •••,/■-  1 
qr,  j * r,r  + 1,  • • • ,n  - r 

qn~j,  j * n — r + 1,  • • • ,n. 

The  trellis  is  repetitive  for  ; * r + 1,  • • • ,n  — r. 

The  general  procedure  for  interconnecting  nodes  of  the 
trellis  is  most  easily  described  by  associating  with  each 
node  at  depth  k a polynomial  s (x  \k ) in  x of  degree  (r  - 1 ) 


Fig.  5.  Trellis  for  depths  k = 0, 1,  2,  3,  and  4. 
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Fig.  6.  Trellis  at  depth  k and  ( k + 1)  for  k * 4,5,  * • • .10. 
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Fig.  7.  Trellis  for  depths  k ■ 1 1,12.  * • • .15. 


Fig.  9.  Encoder  for  single-parity  check  code. 


Fig.  10.  Trellis  for  single-parity  check  code. 


with  coefficients  from  GF(q).  These  polynomials  play  the 
same  roles  as  the  vectors  s(k)  used  previously.  The  poly- 
nomials at  depth  ( k + 1)  are  then  formed  from  the  poly- 
nomials at  depth  k in  accordance  with  the  formula 

si(x\k  + 1)  = (xsi(x;k)  + xraj)  modulo g(x). 

Rather  than  pursue  this  description  in  abstract  detail,  we 
give  two  examples. 

Example  1:  Consider  the  ( 15,1 1 ) binary  cyclic  code  with 
generator  polynomial  g(x)  = x4  + x + 1.  The  correspond- 
ing encoder  is  given  in  Fig.  4. 

For  the  first  1 1 clock  pulses,  the  S i and  S2  are  in  position 
one,  and  the  input  consists  of  the  11  message  digits.  For  the 
next  four  clock  pulses,  the  switches  Si  and  S 2 are  in  posi- 
tion two,  and  zeros  are  fed  in  at  the  input.  The  first  11 
digits  which  appear  at  the  output  are  then  the  11  message 
digits,  and  the  last  four  digits  which  appear  at  the  output 
are  the  four  check  digits. 

The  state  sequence  for  the  encoder  for  the  input  se- 
quence 10100101110  is 

0000-1100-0110— 111’  — 1011  — 1001 

-0100  — 0010—1101—  01  10  — 1 1 1 1 

— 1001  —0101  —0010  — 0001  —000  0. 

In  polynomial  notation,  each  of  these  states  would  be 
represented  by  a polynomial.  For  example,  state  1 0 0 1 at 
depth  11  would  be  represented  by  the  polynomial  s(x;ll) 
- 1 + x3. 


Fig.  1 1.  Steps  in  decoding  single-parity  check  code. 


Portions  of  the  trellis  for  this  code  are  shewn  in  Figs.  5, 
6,  and  7.  The  path  corresponding  to  the  codeword  10  10 
OlOlllOllOlis  shown  in  Fig.  8. 

Example  2:  The  encoder  for  a binary  (n,n  - 1)  single- 
parity check  code  with  generator  polynomial  g(x)  = x + 

1 is  shown  in  Fig.  9,  and  the  resultant  trellis  is  shown  in  Fig. 

10. 

To  illustrate  the  Viterbi  decoding  algorithm  as  applied 
to  this  code,  we  assume  that  the  binary  codeword  is 
transmitted  using  bipolar  signaling  over  a Gaussian 
white-noise  channel.  The  input  to  the  decoder  is  a vector 
of  n real  numbers  c * C1C2C3C4  • • • c„.  Using  word  corre- 
lation, the  maximum  likelihood  decoder  would  form  the 
2n-1  numbers 

Z = t lc;x,7  ~ c,(l  - *„)],  i = 1,2,  ■ • • ,2n— 1 
1 

where  x,;  e )0,1|  is  the  ;th  component  of  the  ith  codeword. 
The  steps  in  the  Viterbi  decoding  algorithm  for  the  case 
of  n = 5 and  c = (3,2, -4, -1,4)  are  shown  in  Fig.  11.  The 
node  values  V(k)  a.e  the  circled  values  at  each  node.  The 
maximum  likelihood  decoded  code  word  is  (1 101 1).  Note 
that  a hard  decision  decoder  would  try  to  decode  the  vector 
(110  0 1)  which  is  equidistant  from  five  different  code- 
words, and  thus  would  fail  to  decode. 

It  should  be  observed  that,  in  the  second  step  of  the 
decoding  algorithm,  a final  decision  has  been  made  on  the 
first  digit  of  the  codeword.  That  is,  at  this  early  point  in  the 
decoding,  we  have  already  decided  that  the  first  compo 
nent  of  the  codeword  is  a 1.  This  is  somewhat  surpris- 
ing— we  have  not  yet  received  the  parity  digit  but  have 
already  made  a final  decision  on  one  of  the  binary  digits! 
The  Wagner  decoding  algorithm  [2]  which  inverts  the  least 
likely  digit  in  the  hard  decision  sequence  if  the  parity  check 
fails,  also  makes  such  final  decisions.  Indeed,  both  algo- 
rithms yield  maximum  likelihood  decoding  and  so  must 
decode  to  the  same  codeword.  These  early  final  decisions 
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are  characteristic  of  Viterbi  decoding  of  arbitrary  codes 
and  are  not  limited  to  just  the  single-parity  check  code. 


V.  Decoding  of  Product  Codes 

In  all  cases  discussed  heretofore,  the  number  of  states 
needed  at  some  depth  in  the  trellis  was  equal  to  q"~k, 
where  (n  — k)  was  the  number  of  parity  symbols  in  the 
code.  For  some  codes,  the  maximum  number  of  states  can 
be  much  less  than  qn~h\ such  is  the  situation  for  product 
codes. 

Consider  a product  code  with  symbols  from  GF(q)  where 
the  row  code  is  an  {n^k  j)  linear  code  and  the  column  code 
is  an  (n2tk2)  linear  code.  The  number  of  parity  symbols  is 
r = ntn2  - kik2  = Mn2  " k2)  +k2(rii  -fei)  + («!  -k\ )(n2 
— k2).  In  what  follows,  we  give  a decoding  algorithm  for  a 
product  code  that  requires  only  qh *2)  states.  By 
symmetry,  an  algorithm  exists  with  qk 2'ni~fei>  states.  If  one 
code  is  a low-rate  code  and  the  other  code  is  a high-rate 
code,  the  savings  in  decoder  complexity  is  enormous  when 
comparing  this  algorithm  to  algorithms  which  require  qr 
or  <7 states. 

For  example,  consider  a binary  product  code  with  a 
(15,5)  three-error  correcting  row  code  and  a (15,14)  sin- 
gle-error detecting  column  code.  The  resultant  code  is  a 
(225,70)  code  with  minimum  distance  14.  The  algorithm 
described  in  the  previous  section  would  seem  to  require  a 
trellis  with  2(22S~70)  = 2155  states.  A decoding  algorithm  for 
such  a trellis  is  outside  the  realm  of  possibility.  In  the  al- 
gorithm to  follow,  only  25  = 32  states  would  be  required  in 
the  trellis. 

Decoding  Algorithm  ( Binary  Case):  Let  Q(i>  denote  the 
correlation  of  the  /th  row  of  the  received  matrix  with  the 
codeword  from  the  row  code  having  the  information 
symbols  j = (J\,j2,  • • • j*,);  ja  e (0,1).  Let  the  htJ  denote 
the  element  in  the  i th  row  and  the  ; th  column  of  the  parity 
check  matrix  of  the  column  code.  It  is  assumed  this  matrix 
is  in  echelon  canonical  form  with  a unit  matrix  on  the  right: 
ht)  e (0,1).  Let  (ii,i2,  • • • ,ir2)  denote  the  node  value 
of  state  <ii,i2.  • • • ,ir?)  after  l rows  of  the  received  matrix 
have  been  cross- correlated  with  the  codewords  in  the 
product  code.  Here  i„  (a  = 1,2,  • • • ,r2)  is  a binary  vector 
of  dimension  k 1(  so  there  are  (2fc,)r2  = 2 k'r*  states. 

Algorithm:  1)  Set  l = 0, 


v(n(h.h 


2)  l — l + l. 


ij  = i2  — 

otherwise. 


3)  Test  if  / = n2.  If  so  go  to  step  5);otherwise  go  to  step 

4). 

4)  Compute,  for  each  (i\,i2,  • • • ,ir,) 

= max  [V,,_l)(fi  + hiij,i2 
i 

+ h2li,  ■ ■■  ,ir2  + hr2ij)  + Qy'OJ. 

For  each  state,  retain  the  sequence  of  j values  that  resulted 
in  the  maximum  V.  Return  to  step  2). 

5)  Compute 

V(n*>(0,0, ...  0)  = max  [V<"4-»>(0,0,  •••  0 ,j)  + Q,tn>). 
j 

The  sequence  of  j values  that  led  to  the  maximum  V{n) 
(0, 0,  • • • 0)  is  the  decoded  codeword. 

6)  Stop. 

A similar  algorithm  holds  for  concatenated  codes,  where 
again  the  number  of  required  states  is  much  reduced  over 
what  might  be  expected  from  considering  the  total  number 
of  parity  digits  transmitted. 
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SUMMARY 

A survey  of  error  control  techniques  for  achieving  reliable  transmission  over  noisy  cormunication  channels 
is  presented.  Both  binary  and  nonbinary  codes  are  considered.  Block  codes  and  tree  codes  are  described 
along  with  their  decoding  algorithms.  The  parameters  of  the  most  frequently  utilized  codes  are  given. 
Finally,  the  performance  of  such  codes  are  considered  for  an  additive  Gaussian  noise  channel  with  and  with- 
out Rayleigh  fading. 

1 . INTRODUCTION 

Assume  that  you  as  an  author  of  a paper  at  this  symposium  have  received  the  following  telegram  from  the 
Program  Chairman:  PRLGRAM  FOB  ASARO  AVP  SYRPOSIAM  CHLNGEO  YSURLPAPRR  NQW  SCHEDULAD  FOR  JONE  7 AT  13,45. 
After  some  effort  at  error  detection  and  correction,  you  could  probably  correctly  Interpret  the  non-numer- 
ical  portion  of  the  text  since  words  in  the  English  language  are  redundant.  That  is,  not  every  combination 
of  the  27  symbols  (26  letters  plus  the  space  symbol)  forms  an  acceptable  message.  For  example  the  sequence 

(of  letters  "PRLGTAM"  is  not  an  English  word  and  thus  errors  have  been  detected  in  this  sequence  of  symbols. 

Since  it  differs  from  the  word  "PROGRAM"  in  only  one  letter  and  differs  from  other  words  in  more  than  one 
letter  it  is  "closest"  to  the  word  "program".  Thus  in  decoding  this  word  to  the  word  "PROGRAM"  we  have 
accomplished  error  correction. 

The  errors  in  the  numerical  portion  of  the  telegram  (the  date  and  the  time)  present  a different  problem. 

In  general,  numbers  do  not  possess  the  redundancy  of  non-numerical  text.  Thus  the  “7“  could  have  been  in 
error  and  we  could  not  detect  or  correct  this  error  from  the  natural  redundancy  of  the  message. 

In  designing  a system  for  the  reliable  transmission  of  data  over  a noisy  communication  channel,  one  cannot 

rely  on  the  natural  redundancy  of  the  message  to  detect  and  correct  errors  since  the  system  must  work  for 
all  types  of  messages  (even  those  without  natural  redundancy  such  as  certain  types  of  computer  data).  Thus, 
we  must  introduce  an  artificial  redundancy  into  the  messages  in  order  to  effect  error  control.  This  arti- 
ficial redundance,  called  coding  for  error  control,  is  the  subject  of  this  paper. 

Codes  for  error  control  come  in  two  distinct  flavors:  block  codes  and  tree  (commonly  called  convolutional) 
codes.  The  next  two  sections  are  concerned  with  the  definitions  and  important  characteristics  of  these  two 
classes  of  codes. 

2.  BLOCK  CODES  (WOLF.  J.  K.,  1973) 

A block  code  of  length  n and  size  M is  a collection  of  M distinct  vectors  called  codewords,  each  vector 
having  n components  belonging  to  some  finite  alphabet  X =>  (0,1,2,... ,q  - 11.  The  rate  of  the  code,  R,  is 
defined  as 

log„  M 
R - — a—  . 
n 

I Since  the  codewords  are  distinct,  1 < H < q and  0 R 1 . For  binary  codes,  q - 2,  while  for  nonbinary 

codes  q > 2.  Usually  q is  chosen  equal  to  a prime  or  a power  of  prime. 

The  Hamming  weight  of  a codeword  is  equal  to  the  number  of  nonzero  components  in  that  vector.  The  minimum 
weight  of  a code  is  the  positive  integer  equal  to  the  smallest  nonzero  Hamming  weight  of  a codeword  In  the 
j code. 

We  assume  henceforth  that  the  elements  of  X form  a finite  field  GF(q)  (so  that  q is  equal  to  a prime  or  a 
power  of  a prime).  The  code  is  linear  if  the  codewords  are  all  the  solutions  to  a set  of  r homogeneous 
linear  equations,  called  generalized  parity-check  equations.  The  coefficients  of  these  equations  are  ele- 
ments from  X.  Let  k = n - r . If  the  equations  are  linearly  Independent,  M ■ q*1,  R » k/n,  and  the  code  is 
termed  an  (n,k)  code.  A code  which  is  not  linear  is  said  to  be  non! inear. 


The  Hairming  distance  between  two  n-vectors  is  equal  to  the  number  of  components  in  which  these  vectors  differ. 

For  a 1 i near  code , the  number  of  codewords  of  Hairming  distanct  1,1*  0,1,2 n,  from  any  given  codeword 

is  equal  to  the  number  of  codewords  of  weight  i.  The  minimum  Hairming  distance  between  a pair  of  distinct 
codewords  in  a code,  dml-n>  (or  the  minimum  weight  of  a linear  code)  yields  Important  information  regarding 

capability  of  the  code  to  a correct  and  detect  random  errors.  A code  can  correct  all  patterns  of  t or  fewer 
random  errors  and  in  addition  detect  all  patterns  having  no  more  than  d errors  (where  d > t)  provided  that 

d + 1 + 1 i W 

If  the  code  is  used  for  error  correction  only  then  d * t and  the  code  can  correct  all  patterns  of  t or  fewer 
random  errors  provided  that 


2.1.  Example  of  a Binary  Block  Code 

To  illustrate  the  ideas  Introduced  in  the  previous  section  we  consider  the  following  simple  example  of  a 
binary  (q  * 2),  (7,3)  code.  Such  a code  has  M * 23  * 8 code  words,  each  of  block  length  7.  If  x * (xj. 


■ 


jj 


x^.Xj.x^.Xj.Xg.Xy)  represents  a code  word  In  the  code,  and  if  these  symbols  satisfy  the  following  set  of  " 
linear  (parity  check)  equations  ( + means  modulo  2 sum) 


then  the  8 code  words  are: 

0 0 0 0 0 0 0 
10  0 110  1 
0 10  10  11 
0 0 11110 
110  0 110 
0 110  10  1 
10  10  0 11 
1 1 1 1 0 0 0. 

The  minimum  distance  of  the  code  is  4 which  is  the  minimum  weight  of  any  nonzero  code  word.  (In  this  special 
case  all  nonzero  code  words  have  the  same  weight  but  this  is  not  usually  the  case.)  Thus  the  code  can  correct 
a single  error  while  detecting  but  not  correcting  a double  error. 

Note  thal  in  this  case,  the  first  3 digits  in  any  code  word  can  be  considered  as  the  message  digits  while 
the  last  4 digits  which  are  calculated  from  the  first  3 are  the  redundant  digits  or  parity  digits. 

2.2.  Some  Important  Classes  of  Block  Codes  (PETERSON,  W.  W.,  E.  0.  Weldon,  Or.,  1972) 

The  following  is  a brief  summary  of  the  characteristics  of  some  important  classes  of  block  codes: 

2.2.1.  Binary  Hamming  Codes  (q  * 2) 


Let  m be  any  positive  integer  >_  2.  Then  for  each  m there  is  a linear  code  with  parameters 
n * block  length  - 2m  - 1 , 
k » message  digits  * 2m  - 1 - m, 
n - k * check  digits  = m. 

These  codes  all  have  minimum  distance  equal  to  3 and  thus  can  correct  any  single  error  in  the  block  of  length 
n digits. 

2.2.2.  Bose-Chaudhuri-Hocquenhem  (BCH)  Codes 

These  are  linear  codes  withmcoefficients  from  any  field  GF(q).  Let  m be  any  positive  integer  >_  1,  let  c be 
any  integer  which  divides  q - 1 and  let  t be  any  positive  integer.  Then  the  code  has  parameters: 

n * block  length  = (qm  * l)/c, 

f 2 mt  q^  2 (nonbinary  codes) 


2.2.3. 
These  are 


2.2.4. 
These  are 


n » k » check  symbols  < 


| mt  q * 2 (binary  codes), 

dmin  “ distance  > 2t  + 1 . 

Reed-Solomon  (R  S)  Codes 

a special  case  of  nonbinary  BCH  codes  formed  by  choosing  m * c * 1.  These  codes  have  parameters: 
n * block  length  * q - 1 , 

n - k * check  symbols  * 2t  » d ^ - 1 . 

Simplex  Codes 

a special  case  of  binary  BCH  codes  formed  by  choosing  c * 1.  These  codes  have  parameters: 
n • block  length  ■ 2m  - 1 , 
k • message  digits  « m , 
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d . = minimum  distance  * 2m'' , 

min 

2.2.5  Golay  Code 

Tnis  is  a special  binary  code  that  has  a very  high  error  correction  capability  for  the  amount  of  redundancy 
utilized.  It  is  also  a special  case  of  the  BCH  codes.  It  has  parameters: 

n = block  length  * 23  , 

k s message  digits  = 12, 

d = minimum  diatance  * 7. 
min 

The  code  thus  can  correct  3 random  errors  in  a block  of  23  digits.  The  code  is  often  used  as  a (24,12) 
code  by  adding  an  extra  parity  digit  which  is  a parity  check  over  all  digits  in  the  block.  The  resultant 
(24,12)  code  then  has  minimum  distance  equal  to  8. 

2.2.6.  Majority  Logic  Decodable  Codes 

These  are  a class  of  codes  that  because  of  the  special  form  of  their  parity  check  equations  lead  to  a parti- 
cularly simple  decoding  algorithm,  (See  the  next  section  for  a further  discussion.) 

2.3.  Decoding  of  81ock  Codes  (WOLF,  J.  K. , 1973) 

If  the  received  word  were  always  an  exact  replica  of  the  transmitted  word  when  a codeword  is  transmitted 
over  a communications  channel,  there  would  be  no  need  for  coding.  Rather,  a noisy  cwmiunications  channel 
distorts  the  transmitted  codewords  in  a stochastic  manner.  A channel  with  input  n-vectors  from  (X)n  (the 
space  of  sequences  of  n symbols  from  the  input  alphabet  X)  and  output  n-vectors  from  (Y)n  (where  Y is  the 
output  alphabet)  can  be  described  by  a conditional  probability  distribution  Py|^(y.|x)  for  all  x e(X)n  and 

j£.  e(Y)n.  Here  X and  are  random  n-vectors  representing  the  input  and  output- n-vectors  for  the  channel, 
and  x and  ^ are  the  specific  values  which  can  be  assumed  by  these  vectors. 

A decoder  is  a device  that  instruments  a decoding  rule  for  choosing  among  the  transmitted  code  words  on  the 
basis  of  the  received  vector  £.  A possible  option,  termed  error  detection,  is  to  choose  no  codeword  at  all 
if  the  received  sequence  is  not  a code  word.  This  option  is  often  utilized  when  the  codeword  can  be  retrans- 
mitted or  reread  from  memory.  A particular  decoding  rule  which  always  decodes  to  a codeword  is  the  one 
that  chooses  the  codeword  having  the  highest  conditional  probability  of  being  transmitted,  given  the  re- 
ceived vector  If  all  codewords  have  equal  probability  a priori,  then  this  rule,  called  a maximum-1 ikel i- 
hood  decoding  rule,  chooses  the  codeword  c_.  for  which  Py|^(y_|c_.) is  the  largest.  A brute-force  application 

of  this  rule  requires  M calculations  of  the  conditional-probability  distribution.  For  a binary  code  of 
block  length  n » 100  and  rate  R * j,  this  works  out  to  2^°  £ 10^  calculations— a hopeless  task  even  with  a 
large  computer.  It  is  the  algebraic  structure  of  the  codes  that  allows  us  to  escape  form  this  dilemna. 

Most  decoding  rules  for  algebraic  block  codes  do  not  realize  a maximum- likelihood  decoding  rule.  Rather, 
they  decode  to  the  most  likely  codeword  only  if  the  noise  on  the  channel  is  not  too  large.  Otherwise  they 
utilize  the  option  of  not  decoding.  Such  a rule  is  called  a bounded-distance  decoding  rule. 

The  Berlekamp  algorithm  for  decoding  BCH  codes  (PETERSON,  W.  W.  and  WELDON,  E.  J.,  1972)  is  a bounded-dis- 
tance decoding  rule  that  requires  that  X = Y and  that  will  decode  correctly  if  and  only  if  the  Hairming  dis- 
tance between  the  received  vector  and  the  transmitted  codeword  does  not  exceed  (d  . - l)/2. 

A class  of  codes  that  are  not  as  powerful  as  BCH  codes  but  that  allow  a simpler  decoding  algorithm  are  the 
majority- logic  decodable  codes.  The  generalized  parity-check  equations  of  these  codes  are  based  upon  the 
combinatorial  configurations  of  finite  geometries.  In  the  simplest  case,  decoding  for  these  codes  is  per- 
formed on  a symbol -by-symbol  basis.  For  each  symbol,  several  generalized  parity-check  equations  are  checked, 
each  equation  predicting  that  the  symbol  be  a particular  element  of  GF(q).  The  field  element  receiving  the 
most  votes  is  taken  to  be  the  correct  value  for  that  symbol.  It  has  been  shown  that  any  decoding  rule  for 
any  code  can  be  realized,  in  principle,  by  properly  weighting  the  votes  of  generalized  parity-check  equa- 
tions. (RUDOLPH,  L.  D.,  ROBBINS,  W.  E.,  1972) 

3.  TREE  CODES,  TRELLIS  CODES  AND  CONVOLUTIONAL  CODES  (WOLF,  J.  K.,  1973) 

Consider  a tree  as  shown  in  Figure  1.  The  small  circles  are  nodes,  and  the  lines  emanating  from  each  node 
are  branches.  We  assume  that  every  node  has  Q branches  emanating  from  it.  Associated  with  each  branch  is 
a sequence  of  nQ  symbols  from  the  alphabet  (0,l,2,,.,,q  * 1).  A tree  code  is  the  set  of  (possibly  infinite) 

sequences  obtained  by  concatenating  the  symbols  on  the  branches  of  each  unique  path  through  the  tree,  Note 
that  although  there  are  an  infinite  number  of  codewords  in  our  code,  the  first  nQ  symbols  for  every  codeword 

can  assume  only  Q different  realizations.  Note  further  that  if  we  truncated  the  tree  by  allowing  each  path 
to  contain  only  L branches,  we  would  have  a block  code  of  block  length  n * nQL  with  H « l"  codewords.  (Here 

the  codewords  may  not  all  be  distinct.)  The  rate  of  the  tree  code  is  defined  as  R * (1/n  ) log  Q. 

o o q 

We  now  introduce  some  structure  in  the  tree.  We  assume  that  the  tree  is  generated  by  a K-state  machine 
with  states  S »S^,...,S»  The  machine  has  Inputs  from  the  set  {0,1,. ...Q  - 11  and  outputs  from 

(x)n°. 

We  assume  the  machine  always  starts  in  state  SQ.  The  machine  is  thought  to  reside  in  a state  until  an  in- 
put is  imposed.  As  a result  of  this  input,  the  machine  produces  an  output  n -vector  and  assumes  a next 
state.  This  change  of  states  and  production  of  outputs  is  described  by  a state-transition  table  that  lists. 
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for  every  state  and  every  Input,  the  next  state  and  the  corresponding  output, 

To  obtain  a tree  code  from  a K-state  machine,  associate  a state  with  each  node,  The  input  then  determines 
which  of  the  Q branches  to  take  from  that  node  (state)  to  the  next  node  (state).  The  n symbols  on  each 
branch  are  the  outputs  of  the  machine.  0 

An  example  of  a state-transition  table  for  a four-state  machine,  with  Q * 2,  X * 10,1),  and  nQ  * 2,  and 

Its  corresponding  tree  code  Is  shown  In  Fig,  2(a)  and  (b).  An  Input  sequence  and  the  corresponding  code- 
word are  given  In  Fig,  2(c),  Note  that  there  are  only  four  states,  so  that  several  of  the  nodes  In  the  tree 
can  be  collapsed  Into  a single  node.  Upon  collapsing  these  nodes,  the  tree  forms  a trellis,  as  shown  In 
Fig.  2(d).  Thus  we  say  a finite-state  machine  generates  a trellis  ^ode. 

Consider  a trellis  code  where  X is  the  finite  field  GF(q)  and  Q s <1  °,  1 i kQ  — no‘  esc*1  c*n 

be  considered  a kQ-vector  with  components  from  X - GF(q).  Let  the  nQ  components  of  the  outputs  be  a fixed 
linear  function  of  the  kQ  components  of  the  present  input  vector  and  the  vkQ  components  of  the  v ininedi- 
ately  preceding  vectors.  "Linear"  here  means  a weighted  sum  of  the  components  with  respect  to  addition 

VK0 

and  multiplication  as  defined  in  GF(q).  The  number  of  states  of  the  machine  need  never  exceed  q . The 
resulting  trellis  code  is  said  to  be  a convolutional  code  of  constraint  length  v (or  kQv).  The  rate  of 
the  code  is  RQ  * (l/nQ)  log  Q * k0/n0. 

A convolutional  code  is  called  systematic  if  kQ  of  the  output  symbols  are  equal  to  the  current  Input  ^-vec- 
tor. Otherwise  the  code  is  nonsystematic,  Nonsystematic  convolutional  codes  are  superior  to  systematic 
convolutional  codes  for  maximum-1 ikel ihood  decoding  on  a random-error  channel,  This  surprising  result  is 
related  to  the  fact  that  every  block  code  is  equivalent  to  a systematic  block  code,  but  not  every  convolu- 
tional code  Is  equivalent  to  a systematic  convolutional  code. 

Two  distance  measures  have  been  suggested  for  convolutional  codes.  The  first,  d . , is  the  minimum  nonzero 

mi  n 

Hanning  distance  between  the  first  (v  + l)nQ  symbols  of  distinct  codewords.  The  second,  dfrge,  is  the  min- 
imum nonzero  Hamming  distance  between  distinct  infinite-length  codewords.  The  free  distance  d,  seems  to 

free 

be  more  closely  related  to  the  performance  of  the  code  for  the  more  powerful  decoding  algorithms, 

Given  a systematic  convolutional  code  of  minimum  distance  d_.  , the  first  k message  digits  can  be  decoded 

min  o 

correctly  if  t or  fewer  errors  occurred  in  the  first  (v  + 1)  n0  transmitted  digits  provided  that 

2t  + 1 < d • , 

— min 

The  relationship  between  d^reg  and  the  error  correction  capability  of  the  code  is  more  obtuse. 

3.1,  An  Example  of  A Convolutional  Code 

The  state-transition  table  and  trellis  of  a convolutional  code  with  parameters  q * 2,  kQ  « 1 , nQ  - 2,  v - 2 

are  given  in  Fig.  3(a)  and  (b).  A realization  of  this  finite-state  machine  in  terms  of  a two-stage  shift 
register  is  given  in  Fig.  3(c).  The  code  has  dm-n  = dfreg  = 4, 

3.2,  Some  Convolutional  Codes 

Very  little  is  known  about  constructing  tree  or  trellis  codes  that  are  not  convolutional  codes,  Thus  in  this 
section  we  restrict  our  attention  to  convolutional  codes.  Indeed,  even  for  convolutional  codes,  there  is  a 
scarcity  of  techniques  for  constructing  good  codes, 

3.2.1.  Single  Error  Correcting  Binary  Codes  (q  = 2) 

Let  v be  any  positive  integer.  Then  the  code  has  parameters: 

nQ  » symbols  per  branch  = 2V, 

kQ  « message  symbols  per  branch  » n - 1 , 
k n -1 

Q * branches  per  node  « 2 0 ■ 2 0 , 
dmin  = distance  * 3, 

3.2.2.  Double  Error  Correcting  Binary  Codes  (q  - 2) 

This  code  is  based  upon  a binary  BCH  code  of  minimum  distance  6,  For  any  positive  Integer  m,  it  has  parameters 
nQ  * symbols  per  branch  * 2m  - 1 , 
kQ  » message  symbols  per  branch  - 2m  - 2 - 2m, 
v = constraint  length  - 1, 


^min  * m*h1mum  distance  * 6. 
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3.2.3.  Self-Orthogonal  Binary  Codes  (q  » 2) 

The  construction  of  these  codes  Is  based  upon  difference  triangles.  They  have  parameters: 

nQ  * symbols  per  branch  « any  Integer, 

dm-n  = minimum  distance  * any  Integer, 

kQ  = message  symbols  per  branch  - nfl  - 1 

v * constraint  length  > (n„  - 1 } (d  . - 1 ) (d  . - 2)/2. 

— o min  min 

3.2.4.  Computer  Generated  Codes 

Most  good  convolutional  codes  have  been  found  by  computer  search  rather  than  by  algebraic  construction 
procedures . 

3.3  Decoding  of  Tree,  Trellis  and  Convolutional  Codes 

Sequential  decoding  is  an  efficient  method  for  finding  the  most  probable  codeword  in  a tree  code,  given  the 
received  sequence  £,  without  searching  the  entire  tree.  In  sequential  decoding,  the  received  alphabet  Y need 
not  be  equal  to  X.  One  begins  at  the  first  node  and  tentatively  chooses  the  branch  whose  code  symbols  are 
most  likely  to  have  produced  that  portion  of  the  received  sequence.  A measure  of  the  difference  between  the 
tentatively  chosen  code  symbols  and  the  corresponding  received  sequence  is  retained.  One  proceeds  by  tenta- 
tively choosing  the  most  likely  branch  from  each  successive  node  until  the  rate  of  growth  of  the  difference 
measure  indicates  that  the  path  being  followed  is  incorrect.  One  then  backtracks  by  going  back  to  a previous 
node  and  taking  a less  likely  branch.  Backtracking  and  trying  alternate  paths  continues  until  a path  is 
found  on  which  the  rate  of  growth  of  the  difference  measure  is  satisfactory.  Of  course,  the  critical  factors 
in  this  approach  are  the  choice  of  the  proper  difference  measure  and  a procedure  to  decide  whether  the  rate 
of  growth  of  this  measure  is  or  is  not  satisfactory. 

An  interesting  modification  of  this  algorithm  is  the  stack  algorithm.  Here  the  decoder  stores  the  difference 
measure  on  several  paths  and  extends  that  path  which  appears  most  likely  to  be  correct.  When  that  path 
temporarily  loses  favor  because  of  the  rate  of  growth  of  its  difference  measure,  the  next  most  likely  path 
is  extended.  All  paths  investigated  are  stored  in  the  decoder  until  the  storage  capacity  of  the  decoder  is 
exceeded.  Then  the  least  likely  paths  are  dropped  from  consideration. 

Viterbi's  maximum-likelihood  decoder  for  convolutional  codes  makes  use  of  the  fact  that  there  is  a trellis 
structure  for  convolutional  codes  (VITERBI,  A.  J.,  1967).  In  fact,  it  applies  to  any  trellis  code,  not  just 
convolutional  codes.  The  essence  of  the  procedure  is  to  keep  only  one  path  to  any  node  in  the  trellis;  of 
course,  the  path  to  keep  is  the  most  likely  one.  The  discarded  paths  to  any  node  can  never  lead  to  the 
most  likely  codeword.  If  the  trellis  is  generated  from  a K-state  machine,  only  K paths  ever  need  be  re- 
tained by  the  decoder. 

Algebraic  decoding  algorithms  exist  for  certain  convolutional  codes.  Some  codes  are  majority  logic  decodable 
in  that  several  parity  checks  are  calculated  for  each  message  digit  and  a majority  vote  on  the  correctness 
of  the  digit  is  taken.  In  other  cases  a form  of  syndrome  decoding  is  employed. 

4.  PERFORMANCE 

Of  prime  interest  to  communications  engineers  is  the  increase  in  performance  furnished  by  coding  systems  as 
compared  to  uncoded  systems.  We  will  take  the  probability  of  error  in  our  binary  message  stream  (either  the 
bit  error  probability  or  the  probability  of  error  in  a block  of  k message  digits)  as  our  measure  of  perfor- 
mance. 

The  efficacy  of  coding  depends  heavily  on  the  particular  conmunications  channel.  We  will  consider  here  two 
different  channels.  In  the  first,  the  only  channel  perturbation  on  the  transmitted  signal  is  additive  white 
Gaussian  noise.  In  the  second,  we  will  assume  that  the  transmitted  signal  experiences  Rayleigh  fading  and 
also  is  corrupted  by  additive  Gaussian  white  noise.  We  consider  both  hard  and  soft  decision  receivers. 

4.1.  Additive  Gaussian  White  Noise  Channel  (Hard  Decisions) 

This  channel  model  which  is  a good  approximation  to  transmission  from  deep  space  has  been  well  studied  in 
the  literature.  We  will  take  as  our  baseline  system  an  uncoded  binary,  phase-shift  keyed  system  employing 
coherent  detection.  For  a bit  error  probability  of  10*',  an  11  db  signal  to  noise  ratio  is  required  while 
for  a bit  error  probability  of  10‘5  the  required  ratio  is  about  9.6  db.  (By  signal-to-noise  ratio  we  mean 
the  ratio  of  the  received  energy  per  bit  to  noise  power  density.) 

When  we  consider  coded  systems,  we  will  assume  that  the  information  rate  (in  bits  per  second)  for  all  systems 
fixed.  Thus,  the  pulse  duration  of  the  uncoded  and  coded  systems  differ.  The  required  ratio  of  received 
energy  per  information  bit  to  noise  power  density  as  measured  in  db  for  a block  error  rate  of  1(W  for  var- 
ious block  codes  is  given  in  Table  I.  For  each  code  we  assume  hard  decisions  at  the  receiver  and  bounded 
distance  decoding  where  the  decoder  corrects  all  error  patterns  containing  t or  fewer  errors. 
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k 

R1 

t 

Pew 

(S/NnR)  db 

Comnents 

23 

12 

.522 

3 

1 X 10'7 

9.3 

Golay  code 

21 

12 

.571 

2 

1 x 10'7 

10.0 

BCH 

31 

16 

.517 

3 

1 x 10‘7 

9.3 

BCH 

45 

29 

.644 

2 

1 x 10'7 

9.3 

BCH 

31 

21 

.678 

2 

1 x ID'7 

10.3 

BCH 

63 

36 

.571 

5 

1 x 10‘7 

8.0 

BCH 

63 

39 

.619 

4 

1 x IQ'7 

8.5 

BCH 

63 

45 

.714 

3 

1 x 10*7 

9.0 

BCH 

73 

45 

.616 

4 

1 x 10'7 

8.5 

BCH 

127 

92 

.724 

5 

i 

o 

X 

8.0 

BCH 

127 

71 

.559 

9 

X 

o 

7.0 

BCH 

255 

179 

.702 

10 

X 

o 

1 

■'J 

7.0 

BCH 

255 

115 

.451 

21 

X 

o 

1 

6.5 

BCH 

1 

1 

1.000 

0 

1 X 10’7 

11.0 

Uncoded 

We  note  that  codes  of  moderate  complexity  save  approximately  2 to  3 db  In  required  slgnal-to-nolse  ratio 
over  uncoded  systems  while  the  very  complex  (255,115)  21  error  correcting  code  achieves  a saving  of  4,5  db, 

It  is  to  be  noted  that  we  are  comparing  codes  with  different  block  lengths  and  that  we  have  fixed  the  block 
error  probability  and  not  the  bit  error  probability.  However,  essentially  the  same  result  is  obtained  when 
we  fix  the  bit  error  probability.  At  a block  error  probability  of  10*5  approximately  1 db  less  signal-to- 
noise  ratio  is  required. 

For  the  same  channel  model,  convolutional  codes  outperform  block  codes  of  the  same  rates,  A rate  1/2  con- 
volutional code  of  long  constraint  length  employing  sequential  decoding  or  the  stack  algorithm  requires  a 
ratio  of  energy  per  bit  to  noise  power  density  of  approximately  4.5  db  in  order  to  achieve  a bit  error 
probability  of  10"',  This  is  a saving  of  5.5  db  over  the  uncoded  system  but  requires  a very  complex  decoder. 

Short  constraint  length  convolutional  codes  employing  Viterbi  decoding  also  outperform  block  codes  for  this 
channel.  A binary  rate  1/2  convolutional  code  of  moderate  constraint  length  requires  a ratio  of  energy  per 
bit  to  noise  power  density  of  about  7 db  in  order  to  achieve  a bit  error  probability  of  10"5.  Shorter  con- 
straint length  codes  require  somewhat  higher  slgnal-to-nolse  ratio  but  savings  of  more  than  3 db  are  obtained 
for  relatively  simple  codes  (and  decoding  algorithms)  (HELLER,  J.  A.,  JACOBS,  I.  M,,  1971). 

4.2  Additive  Gaussian  White  Noise  Channel  (Soft  Decisions) 

For  an  additive  Gaussian  white  noise  channel,  the  maximum  likelihood  receiver  for  uncoded  bipolar  signalling 
consists  of  a matched  filter  followed  by  a threshold  decision  device.  In  the  previous  section  it  was 
assumed  that  such  a detector  was  used  for  the  coded  case  prior  to  the  decoding  circuitry.  Thus  the  de- 
coder was  presented  with  a sequence  of  0 1 s and  Ts  at  its  input. 


It  is  well  known  that  in  ^he  coded  case  the  analog  signal  at  the  output  of  the  matched  filter  prior  to  the 
thresholding  contains  more  information  than  the  "hard  decisions"  emanating  from  the  threshold  device.  In 
fact  no  information  is  lost  by  the  matched  filtering  and  these  "soft  decisions"  at  the  output  of  the  matched 
filter  contain  all  the  information  required  to  make  a maximum  likelihood  decision  in  the  coded  case. 

As  a rule  of  thumb,  one  can  say  that  for  any  given  code,  one  achieves  an  additional  savings  of  approximately 
2 db  by  using  the  soft  decisions  at  the  decoder  input  rather  than  the  hard  decisions.  In  principle,  this 
2 db  savings  can  be  obtained  for  both  block  and  convolutional  codes.  In  practice,  however,  soft  decision 
decoding  is  much  easier  to  use  for  convolutional  codes  than  for  block  codes. 

At  a bit  error  probability  of  10-5,  the  following  table  (HELLER,  0.  A.,  JACOBS,  I,  M. , 1971)  gives  the  per- 
formance of  some  convolutional  codes  of  constraint  length  7 using  the  Viterbi  algorithm  and  soft  decision  de- 
coding. 

(Energy  per  bit/noise 

Type  Rate  power  density)  db 


Convolutional 

Convolutional 

Convolutional 


3/4 

4.4  db 

1/2 

5.5  db 

1/3 

4.0  db 

Longer  constraint  length  codes  achieve  even  better  performance  but  for  very  long  constraint  lengths  the 
Viterbi  algorithm  is  impractical  and  one  must  use  sequential  decoding  or  the  stack  algorithm. 
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Soft  decision  decoding  of  block  codes  theoretically  show  comparable  performance  to  convolutional  codes  but  * 

the  complexity  of  decoding  often  makes  such  a scheme  impractical.  One  can  always  build  an  optimum  maximum 
likelihood  decoder  with  a decoding  complexity  proportional  to  the  nunber  of  codes.  For  a binary  (n,k)  code, 
this  means  the  decoding  complexity  Is  proportional  to  2k.  Recently  (WOLF,  J.  K. , 1978)  an  optimum  algorithm 

was  presented  which  has  a complexity  proportional  to  for  such  codes.  Various  sub-optimum  algorithms 

show  promise. 


The  performance  of  such  codes  improve  with  blocklength.  The  following  table  gives  the  required  ratio  of 
energy  per  bit  to  noise  density  for  orthogonal  codes  using  soft-decision  maximum  likelihood  decoding  in 
order  to  achieve  a bit  error  probability  of  10”'. 

(n.k) 

(required  signal-to- 
noise  ratio)  db 

(8,3) 

8 

10  db 

(16,4) 

16 

9 db 

(32.5) 

32 

8.3  db 

(64,6) 

64 

7.6  db 

(1024,10) 

1024 

5.9  db 

(215.15) 

215 

4.9  db 

(2*.-) 

2“ 

-1 .6  db  (limiting 

case) 

The  extended  Golay  (24,12)  code  requires  about  5.5  db  signal  to  noise  ratio  in  order  to  achieve  a bit  error 
probability  of  10  . 

4.3.  Rayleigh  Fading  Channel 

We  first  consider  a block  coding  scheme  (PIEPER,  J.  F.,  PROAKIS,  J.  G.,  REED,  R.  R.,  WOLF,  J.  K.)  where  the 
data  bits  are  represented  by  n bits  using  an  (n,k)  block  code.  These  n bits  are  assigned  to  n frequency 
slots  so  that  if  the  bit  is  a 1 that  frequency  is  transmitted  while  if  it  is  a 0 the  frequency  is  not  trans- 
mitted. It  is  assumed  that  the  frequencies  fade  independently  in  accordance  with  Rayleigh  statistics  and 
that  all  frequency  channels  are  corrupted  by  independent  Gaussian  noise  of  flat  spectrum. 

To  decode,  the  squared  magnitudes  of  the  responses  of  the  matched  filters  corresponding  to  the  n frequency 
cells  are  first  formed.  We  call  these  "decision  variables".  Then,  for  each  code  word,  these  decision  vari- 
ables corresponding  to  1 1 s in  the  code  word  are  sunned.  If  all  the  code  words  have  the  same  Haiming  weight, 
that  is,  the  same  number  of  l's,  then  the  maximum  likelihood  decoder  decodes  to  that  code  word  having  the 
largest  sum  of  decision  variables.  Such  a scheme  is  only  practical  for  moderate  values  of  k (say  k < 10). 

A similar  transmission  scheme  can  be  considered  for  convolutional  codes.  For  example  for  a rate  1/2  code, 

I I every  message  digit  corresponds  to  two  channel  symbols  and  thus  two  frequencies. 

Curves  of  performance  for  both  block  and  convolutional  codes  are  shown  in  Figure  4.  It  is  seen  that  the 
savings  in  signal-to-noise  ratio  achieved  by  coding  is  much  greater  here  than  in  the  non-fading  channel 
model . 

5.  SUMMARY 

The  purpose  of  this  paper  was  to  present  an  overview  of  various  coding  techniques  available  for  error  control 
over  noisy  communications  channels.  The  parameters  of  several  coimion  codes  were  given.  The  performance  of 
these  codes  for  signalling  over  two  conmon  comnuni cations  channels  was  then  presented. 
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Figure  4 Probability  of  error  versus  signal -to-nolse  ratio  per 
Information  bit  (In  db)  for  certain  codes  In  a Rayleigh 
fading  environment 
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— The  design  of  a ending/ moduli  boa  Rmnw  (or  digital 
mm-— iftiinm  over  a Rayleigh  fading  rhannH,  the  structure  of  the 
aarepooding  decoder,  and  the  error  rate  performance  of  the  resulting 
jjluiB  are  considered.  Emphaih  is  on  the  me  of  constant  weight  codes  for 
constructing  equal  energy  waveforms  for  transmissioa  over  the  i 

The  performance  gains  that  are  achieved  by  the  integrated  coding/modo- 
hdoo  approach  relative  to  conventional  methods  for  obtaining  diversity  are 
lUatrated  via  some  examples.  Of  special  Interest  is  the  use  of  a con- 
catenated ending  technique  for  forming  codes  of  large  distance  and  hence 
Ugh  diversity.  A new  decoding  algorithm  h applied  to  enable  efficient 
decoding  of  the  concatenated  code.  An  example  is  indnded  that  shows  a 
performance  increase  of  several  dB  resulting  from  nvcKwniini 


I.  Introduction 

AN  INTEGRATED  coding/ modulation  approach 
for  digital  transmission  over  a Rayleigh  fading 
channel  is  presented.  Our  main  purpose  is  to  illustrate  the 
performance  gains  that  can  be  achieved  by  such  an  ap- 
proach relative  to  conventional  methods  that  rely  on 
standard  diversity  techniques  for  obtaining  a desired  relia- 
bility in  transmission.  Some  well-known  codes  are  used  as 
examples  to  illustrate  the  benefits  of  the  integrated 
coding/ modulation  approach.  Of  particular  significance  is 
the  additional  gain  in  performance  achieved  by  con- 
catenated coding.  A computationally  efficient  algorithm 
for  decoding  a high-rate  concatenated  code  is  also  pre- 
sented. 

We  do  not  consider  the  channel  itself  in  detail.  Instead 
we  postulate  in  this  section,  as  a background  to  the  rest  of 
the  paper,  certain  properties  of  the  channel  and  of  the 
resulting  signal  structure.  We  offer  the  underwater 
acoustic  communications  channel  as  an  exampie  of  the 
type  of  channel  we  describe. 

We  assume  that  a signaling  space  is  available  that  is 
partitioned  into  time/frequency  cells.  Within  a cell  a tone 
may  be  transmitted.  We  invoke  the  usual  simplifying 
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idealizations  that  the  fading  and  (additive  white  Gaussian) 
noise  processes  are  independent  and  identically  distrib- 
uted among  all  the  cells.  We  also  assume  sufficient  separa- 
tion (guard  space)  between  adjacent  cells  so  that  intercell 
interference  can  be  considered  negligible.  We  note  that 
this  separation  often  follows  directly  from  the  assumption 
of  independent  fading  that  requires  that  the  cells  be 
separated  by  at  least  the  coherence  time  and  coherence 
bandwidth  of  the  channel.  As  required,  messages  ran  be 
interleaved  to  prevent  this  separation  of  cells  from  result- 
ing in  an  excessively  inefficient  utilization  of  the  signaling 
space.  We  choose  not  to  presume  the  ability  to  establish 
phase  references  for  the  cells  in  a signal.  Rather,  we  limit 
ourselves  to  the  important  special  case  wherein  coherent 
combination  is  not  possible.  For  an  application  where  this 
restriction  is  not  necessary,  the  extension  of  our  results  is 
direct 

In  Section  II,  we  consider  the  design  of  a generic 
coding/modulation  structure  appropriate  for  the  channel 
and  the  corresponding  maximum  likelihood  decoder.  In 
Section  III,  we  briefly  describe  several  methods  for  form- 
ing codes  for  the  fading  channel  and  introduce  a particu- 
larly useful  technique  based  upon  concatenation.  In  Sec- 
tion IV,  we  derive  upper  bounds  on  the  average  error 
probability  for  the  maximum  likelihood  (soft-decision) 
decoder  for  block  codes  and  convolutional  codes.  Some 
performance  results  are  presented  that  illustrate  the 
advantages  of  an  integrated  coding/modulation  approach 
relative  to  conventional  diversity  techniques.  The  added 
performance  gain  achieved  by  concatenation  is  illustrated 
by  means  of  an  example  in  which  a Reed- Solomon  code 
is  used  as  the  outer  code  and  a Hadamard  code  is  used  as 
the  inner  code.  Finally  in  Section  V,  we  present  an 
efficient  decoding  algorithm  appropriate  for  the  soft-deci- 
sion decoding  of  (high-rate)  concatenated  codes. 

II.  Integrated  Coding /V  odulatton  Design 

A model  of  the  digital  communications  system  that  we 
will  consider  is  shown  in  Fig.  1.  The  transmitter  employs  a 
combined  encoder/modulator  to  generate  waveforms 
from  the  input  data  bits.  For  this  pan  of  the  discussion, 
we  restrict  ourselves  to  the  use  of  block  coding.  The  block 
encoder  accepts  k information  bits  at  a time  and  maps 
them  into  blocks  of  n bits.  We  employ  the  usual  notation 


00 1 8-9448/78 /0700-457S00.75  ©1978  IEEE 


I 


of  an  ( n,k ) code1  for  a binary  block  code  wherein  each 
codeword  is  composed  of  n bits  and  conveys  k bits  of 
information.  The  number  of  different  information-beanng 
codewords  possible  in  an  (n,k)  code  is  2*.  If  the  total 
number  of  words  satisfying  the  defining  properties  of  a 
code  is  not  a power  of  two  (which  implies  that  the  code  is 
nonlinear),  then  the  2*  code  words  that  can  be  used  to 
convey  binary  data  are  chosen  as  a proper  subset  of  the 
entire  code.  For  the  sake  of  generality,  we  denote  the 
number  of  codewords  as  M. 

The  modulator  accepts  the  block  of  n bits  correspond- 
ing to  a codeword  and  assigns  each  bit  to  a cell  m the 
partitioned  signal  space.  Waveforms  are  constructed  by 
following  the  convention  that  a tone  pulse  is  generated  in 
a cell  if  a “one”  bit  is  assigned  to  that  cell  and  that  no 
energy  is  transmitted  in  a cell  to  which  a “zero”  is 
assigned.  This  choice  of  basic  cellular  modulation  is  based 
our  assumed  inability  to  detect  received  tones  coher- 
ently. The  waveform  type  that  results  for  an  ensemble  of 
such  signals  is  termed  multitone  on/off  keying 
(MTOOK).  Because  of  the  assumed  mutual  statistical 
independence  of  the  fading  in  the  cells,  the  performance 
of  the  communications  system  will  not  depend  upon  the 
particular  assignment  of  message  bits  to  signaling  cells. 
Accordingly  we  choose  the  most  direct  model  whereby  the 
bits  composing  one  codeword  are  all  transmitted  within 
one  time  slice. 

An  alternative  mapping  of  the  n bits  into  channel 
waveforms  can  be  accomplished  by  means  of  frequency- 
shift  keying  (FSK).  With  FSK,  each  bit  in  a codeword  is 
assigned  two  cells:  one  cell  for  a “one"  and  the  other  for  a 
<zcro.”  Consequently  a total  of  2n  cells  are  required  to 
transmit  the  code  block  of  n bits.  In  this  mapping,  n out 
of  the  2/i  cells  will  be  keyed  on  for  each  code  block.  The 
type  of  waveform  that  results  from  this  mapping  is  termed 
multitone  FSK  (MTFSK). 

It  should  be  noted  at  this  point  that  the  MTFSK 
waveform  is  identical  to  the  MTOOK  waveform  if  the 
latter  is  generated  by  replacing  each  “zero”  in  the  block  of 
n bits  by  01  (0-*01)  and  each  “one”  by  10  (1— ►10).  The 
result  of  this  mapping  (0— *01, 1— *10)  is  simply  to  double 
the  block  length  and  the  minimum  distance  of  the  code. 
In  addition,  the  mapping  results  in  a fixed  weight  code  of 
weight  equal  to  n,  Le.,  all  of  the  length  2n  codewords  have 
Hamming  weight  n. 

'It  is  customary  to  use  this  notation  only  for  linear  block  but 
for  convenience  we  use  the  notation  (n,k)  for  any  block  code  where  n 
and  k are  defined  above. 


Since  the  MTFSK  and  MTOOK  waveforms  are  ren- 
dered identical  under  the  mapping  given  above,  it  is 
unnecessary  to  consider  the  two  waveforms  separately. 
Instead,  we  arbitrarily  choose  the  term  MTOOK  to  de- 
scribe the  signal  waveforms  generated  by  either  modula- 
tor. 


The  receiver  we  choose  employs  maTimtiTn  likelihood 
(soft-decision)  decoding  to  decide  to  which  of  the  M 


possible  codewords  a received  waveform  corresponds.  The 
received  waveform  is  separated  into  n spectral  resolution 
cells  corresponding  to  the  available  tone  frequencies  at  the 
transmitter  as  shown  in  Fig.  1.  Thus  there  are  n filters,  the 
mth  filter  being  matched  to  the  /nth  frequency  position  in 
the  MTOOK  signal  structure.  Since  the  Rayleigh  fading 
and  the  additive  white  Gaussian  noises  are  mutually  stat- 
istically independent  and  identically  distributed  random 
processes,  the  maximum  likelihood  criterion  requires  that 
these  filter  responses  be  noncoberently  (square-law)  de- 
tected and  combined  to  form  the  log-likelihood  terms  for 
each  of  the  M hypotheses.  The  codeword  corresponding 
to  the  maximum  of  these  terms  is  then  selected  [1]. 

The  computation  of  the  log-likelihood  quantities  is  ex- 
tremely simple  when  all  the  M waveforms  have  exactly 
the  same  average  received  energy.  Under  this  condition 
and  for  mutually  independent  and  identically  distributed 
Rayleigh  fading  among  the  n cells  as  postulated  in  the 
previous  section,  the  receiver  does  not  have  to  know  (or 
measure)  the  signal-to-noise  ratio  in  each  received  cell 
and  no  bias  compensation  is  required  in  the  computation 
of  the  log-likelihood  quantities  [2].  This  is  such  an  im- 
portant consideration  in  any  practical  implementation  of 
the  receiver  that  we  impose  it  as  a requirement  for  the  M 
waveforms. 

The  condition  of  equal  average  energy  among  the  M 
waveforms  is  satisfied  if  every  one  of  the  M codewords 
has  exactly  w ones  and  n-w  zeros,  i.en  if  the  code  has 
constant  weight  w.  Since  the  mapping  0— *01  and  1 — *10 
can  be  used  to  convert  any  variable  weight  code  to  a 
constant  weight  code,  there  is  really  no  restriction  on  our 
choice  of  codes. 

We  now  complete  the  discussion  of  the  computation  of 
the  log-likelihood  quantities  under  the  constraint  of 
equal-energy  waveforms.  Let  |uj2  be  the  square  of  the 
envelope  of  the  output  of  the  mth  matched  filter.  Let  the 
mth  element  of  the  tth  codeword  be  denoted  where  xm 
is  either  0 or  1.  The  ith  codeword  is  then  the  binary  vector 
' * •*«.)•  and  there  are  M such  vectors  for  /'■  1,2, 
It  can  be  easily  shown  [2]  that  if  the  M signal 
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patterns  (corresponding  to  the  M codewords)  are  tians- 
mitted  with  equal  a priori  probability,  then  the  decoder 
that  achieves  the  smallest  possible  probability  of  error  in 
choosing  the  correct  codeword  based  upon  the  received 
waveform  is  the  one  that  computes  the  M decision  vari- 
ables (log-likelihood  quantities) 

»/•  2 i- 1,2, •••,!»/  (1) 

m—  1 

and  chooses  the  codeword  corresponding  to  the  index  i 
for  which  the  summation  is  a maximum. 

Although  the  discussion  given  above  assumed  the  use  of 
a block  code,  a convolutional  encoder  could  be  sub- 
stituted for  the  block  encoder  shown  in  Fig.  1.  For  exam- 
ple, if  a binary  convolutional  code  is  selected,  then  the 
output  sequence  of  zeros  and  ones  may  be  transmitted  by 
binary  FSK,  or  equivalently,  by  on/off  keying  if  the 
mapping  0— *01  and  1-»10  is  performed  on  the  output 
sequence  of  the  encoder.  The  maximum  likelihood  (soft- 
decision)  detection  criterion  for  the  convolutional  code 
can  be  efficiendy  implemented  by  means  of  the  Viterbi 
algorithm  when  the  metric  corresponding  to  any  path 
through  the  tree  or  trellis  is  chosen  as  a linear  sum  of 
squared  envelopes  selected  from  the  two  matched  filter 
outputs  [3].  Therefore  the  substitution  of  a convolutional 
encoder  for  a block  encoder  is  straightforward. 

An  important  parameter  in  a block  code  is  the  mini- 
mum (Hamming)  distance,  denoted  by  dmrn.  For  a con- 
volutional code,  the  distance  parameter  of  interest  is  the 
minimum  free  distance,  denoted  by  The  dependence 
of  the  error  probability  on  the  distance  parameter  is  given 
in  Section  IV.  Another  important  parameter  is  the  code 
rate,  defined  as  the  ratio  k/n.  In  our  comparison  of  code 
performance,  we  find  it  convenient  to  use  the  reciprocal 
(«/ k)  of  this  ratio,  which  we  call  the  bandwidth  expansion 
factor. 

We  now  present  several  methods  for  constructing  codes 
that  result  in  equal-energy  waveforms. 

III.  Methods  for  Constructing  Codes  That 
Result  in  Equal-Energy  Waveforms 

Several  methods  for  generating  codes  that  result  in 
equal-energy  waveforms  are  presented  in  this  section.  We 
refer  to  the  resulting  codes  as  constant  weight  codes.  Since 
by  definition  a constant  weight  code  cannot  contain  the 
all  zero  (identity)  codeword,  such  a code  must  be  nonlin- 
ear. Nonetheless,  by  the  use  of  an  appropriate  nonlinear 
operation  a constant  weight  code  may  be  constructed 
from  a linear  code.  Thus  much  of  the  prior  coding  art 
may  be  applied  to  our  problem.  We  briefly  describe 
several  methods  by  which  constant  weight  codes  can  be 
constructed.  This  discussion  is  by  no  means  exhaustive. 

Method  1:  Nonlinear  Transformation  of  a Unear  Code 

In  general,  if  in  each  word  of  an  arbitrary  binary  code 
we  substitute  one  binary  sequence  for  every  occurrence  of 
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table  i 

Examples  of  Constant  Weight  Code  Foamed  »y  Method  1 


Coda  Parana cars 

Original  Golay 

Cone cant  Weight 

a 

24 

48 

k 

12 

12 

M 

4096 

4096 

8 

16 

V 

variable 

24 

a zero  and  another  sequence  for  each  one,  a constant 
weight  binary  block  code  will  be  obtained  if  the  two 
substitution  sequences  are  of  equal  weights  and  lengths.  If 
the  length  of  the  sequence  is  r and  the  original  code  is  an 
(n,k)  code,  then  the  resulting  constant  weight  code  will  be 
an  ( vn,k ) code.  The  weight  will  be  n times  the  weight  of 
the  substitution  sequence,  and  the  minimum  distance  will 
be  the  minimum  distance  of  the  original  axle  times  the 
distance  between  the  two  substitution  sequences.  Thus  the 
use  of  complementary  sequences  when  r is  even  results  in 
a code  with  minimum  distance  vdmB  and  weight  rn/ 2. 

The  simplest  form  of  this  method  is  the  case  v — 2 
described  in  the  previous  section,  where  every  0 is  re- 
placed by  the  pair  01  and  every  1 is  replaced  by  the 
complementary  sequence  10  (or  vice  versa).  As  an  exam- 
ple, we  take  as  the  initial  code  the  (24, 12)  extended  Golay 
code.  The  parameter'  of  the  original  and  of  the  resultant 
constant  weight  code  are  given  in  Table  I. 

We  note  that  this  substitution  process  can  be  viewed  as 
a separate  encoding.  This  secondary  encoding  clearly  does 
not  alter  the  information  content  of  a codeword — it 
merely  changes  the  form  in  which  it  is  transmitted  Since 
the  new  codeword  is  composed  of  pairs  of  bits,  one  “on” 
and  one  “off,”  MTOOK  transmission  of  this  codeword 
produces  a MTFSK  waveform  as  indicated  in  the  previ- 
ous section. 

The  substitution  of  complementary  binary  sequences 
for  the  output  sequence  of  zeros  and  ones  from  a binary 
convolutional  encoder  also  results  in  equal-energy  wave- 
forms. Hence  this  method  is  not  restricted  to  block  codes, 
unlike  the  next  two  methods  that  are  presented 

Method  2:  Expurgation 

In  this  method  we  start  with  an  arbitrary  binary  block 
code  and  select  from  it  a subset  consisting  of  all  words  of 
a certain  weight  Several  different  constant  weight  codes 
can  be  obtained  from  one  initial  code  by  varying  the 
choice  of  the  weight  w.  Since  the  codewords  of  the 
resulting  expurgated  code  can  be  viewed  as  a subset  of  all 
possible  permutations  of  any  one  codeword  in  the  set 
the  term  “binary  expurgated  permutation  modulation" 

(B EXPERM)  has  been  coined  by  Gaarder  [2]  for  describ- 
ing such  a code.  In  fact,  the  constant  weight  binary  block 
codes  constructed  by  the  other  methods  may  also  be 
viewed  as  BEXPERM  codes.  This  method  of  generating 
constant  weight  codes  is  in  a sense  opposite  to  the  first  . : 





method  in  that  the  word  length  n is  held  constant  and  the 
code  size  M is  changed.  The  minimum  distance  for  the 
constant  weight  subset  will  clearly  be  no  less  than  that  of 
the  original  code.  As  an  example,  we  again  consider  the 
(24, 12)  extended  Golay  code  and  form  the  two  different 
constant  weight  codes  shown  in  Table  II. 


TABLE  IV 

Pakameteu  or  Concatenated  Code 


Method  3:  Hadamard  Matrices 


This  method  might  appear  to  form  a constant  weight 
binary  block  code  directly,  but  it  actually  is  a special  case 
of  the  method  of  expurgation.  In  this  method,  a Hada- 
mard matrix  is  formed,  and  a constant  weight  code  is 
created  by  selection  of  rows  (codewords)  from  this  matrix 
[4],  [5],  An  alternative  method  is  to  start  with  a Hadamard 
polynomial  and  to  generate  codewords  corresponding  to 
cyclic  shifts  of  the  coefficient  vector.  These  two  methods 
are  equivalent  since  the  matrix  may  be  defined  as  being 
composed  of  row  vectors  determined  by  cyclic  shifts  of 
the  polynomial  coefficient  vector. 

A Hadamard  matrix  is  an  n X n matrix  (n  an  even 
integer)  of  ones  and  zeros  with  the  property  that  any  row 
differs  from  any  other  row  in  exactly  n /2  positions.  (If,  as 
is  sometimes  the  convention,  the  elements  of  the  matrix 
are  either  +1  or  — 1,  then  the  defining  property  of  the 
matrix  is  that  the  rows  are  all  mutually  orthogonal)  One 
row  of  the  matrix  is  normally  chosen  as  being  all  zeros. 
The  other  rows  are  then  half  zeros  and  half  ones.  A 
constant  weight  binary  block  code  is  obtained  by  selecting 
these  latter  n—  1 rows.  This  code  can  be  extended  to  a 
code  of  size  A/-2(n—  1)  by  including  the  complements  of 
these  rows.  We  will  refer  to  such  a constant  weight  code 
as  a Hadamard  code,  denoted  H(n,k),  although  strictly 
speaking  one  should  define  a Hadamard  code  as  the  code 
of  size  A/ — 2a  consisting  of  all  rows  and  their  comple- 
ments of  the  Hadamard  matrix. 

Hadamard  matrices  (and  hence  Hadamard  codes)  have 
been  shown  to  exist  for  a — 2,4,  and  all  multiples  of  4 up 
to  200  with  a few  possible  exceptions  [4].  The  properties  of 
a general  constant  weight  Hadamard  code  and  of  a 
specific  code  to  which  we  shall  later  refer  are  given  in 
Table  III,  where  [•  J denotes  the  greatest  integer  less  than 
or  equal  to  the  enclosed  number. 


Method  4:  Concatenation 


In  this  method  we  begin  with  two  block  codes,  one 
binary  and  the  other  nonbinary.  The  binary  code  is  called 


I 


word  length 

fll 

Information  contest 

* l°t2c 

coda  iln 

* 

■lnlwuei  distance 

d Z d 

aln.i  alA'O 

weight 

the  inner  code  and  is  an  (n,k)  constant  weight  (nonlinear) 
block  code.  The  nonbinary  code  (that  may  be  linear)  is 
known  as  the  outer  code.  To  distinguish  it  from  the  inner 
code,  we  use  upper  case  lerers  and  brackets — e.g,  an 
[Af.Af]  code,  where  N and  F are  measured  in  terms  of 
symbols  from  * q- ary  alphabet  The  size  q of  the  alphabet 
over  which  the  outer  code  is  defined  cannot  be  greater 
than  the  number  of  words  in  the  inner  code.  Using  these 
two  codes  we  form  a constant  weight  binary  block  code 
by  assigning  to  each  symbol  in  the  nonbinary  alphabet  a 
distinct  word  from  the  inner  code.  The  outer  code,  when 
defined  in  terms  of  the  binary  inner  codewords  rather 
than  q- ary  symbols,  is  the  new  code.  We  refer  to  this 
process  as  concatenation  because  of  its  similarity  to  the 
coding  scheme  introduced  by  Fomey  [6]  to  whom  the 
terminology  of  inner  and  outer  codes  is  also  due. 

That  the  concatenated  code  which  results  is  a constant 
weight  binary  block  code  is  easy  to  verify.  We  call  it  an 
[N,  K](n,k)  code  to  indicate  the  salient  features  of  the 
outer  and  inner  codes  from  which  it  is  formed.  The 
parameters  of  the  concatenated  code  are  given  in  Table 
IV,  where  the  subscripts  i and  o denote  inner  and  outer, 
respectively. 

An  important  special  case  is  obtained  when  q is  equal 
to  2*  and  the  inner  code  size  is  chosen  to  be  2*.  Then  the 
number  of  words  is  M— 2**,  and  the  concatenated  struc- 
ture is  an  ( nN,kK)  code.  The  bandwidth  expansion  factor 
of  this  concatenated  code  is  the  product  of  the  bandwidth 
expansions  for  the  inner  and  outer  codes. 

In  this  paper  we  will  consider  one  specific  type  of  outer 
code  that  is  especially  well  suited  to  the  concatenation 
technique.  This  is  a single  parity  symbol  Reed -Solomon 
code  defined  over  an  alphabet  of  arbitrary  size  q with 
arithmetic  modulo-?.  (Strictly  speaking  when  q is  not  a 
power  of  a prime,  this  code  is  not  a Reed -Solomon  code. 
Further,  when  q is  not  a prime  but  only  a power  of  prime. 
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TABLE  V 

Example  of  Concatenated  Code  and  In  Parameters 


Cod*  Parasttn 

lunar 

outer 

con  eat  ana  cad 

word  l«nf ch 

20 

9 

180 

Information  coat  ant  ' 

3 

8 

40 

codo  also 

aalacc  32 
from  38 

ji'-i40 

i40 

einiatf  diaunca 

10 

2 | 

20 

wolghC 

10 

— 

90 

bandwidth  axpaaalon 

4. 

1.123  ( 

4.3 

the  arithmetic  of  a Reed-Solomon  code  is  not  modulo-4 
but  is  that  for  GF(q).  However,  for  the  single  parity  case, 
these  restrictions  are  not  required  and  all  of  the  properties 
of  a true  Reed-Solomon  code  apply  to  the  code  we  use.) 
A word  m this  [N,N  — 1]  code  consists  of  N — 1 in  form  a - 
tion  symbols  and  a single  parity  symbol  chosen  so  that  the 
modulo-4  sum  of  all  N symbols  is  zero.  The  minimum 
distance  of  this  outer  code  is  two.  The  value  of  N is 
arbitrary  and  can  be  chosen  as  convenient  The  alphabet 
size  4 is  chosen  the  same  as  the  size  of  the  particular  inner 
code  to  be  used. 

As  a specific  example,  which  we  will  later  examine  in 
detail,  we  consider  the  concatenation  of  the  [9,8]  Reed 
-Solomon  code  with  the  H(20,5)  code.  By  defining  the 
[9,8]  code  over  a 32-ary  alphabet  we  have  a concatenated 
code  with  parameters  given  in  Table  V. 

We  may  picture  the  parity  symbol  in  the  outer  code  as 
being  appended  to  the  first  N- 1 symbols.  A useful  inter- 
pretation of  the  concatenation  process,  based  on  the  sys- 
tematic property  of  this  outer  code,  then  results.  N — 1 
words  from  the  inner  code  are  transmitted  in  exactly  the 
same  manner  that  they  would  be  if  only  the  inner  code 
were  being  used.  The  only  effect  that  the  concatenation 
has  is  to  insert  a parity  symbol  into  the  data  stream  The 
overall  data  rate  is  decreased  by  only  a factor  of  (N  — 
')/#.  yet  the  minimum  distance  is  doubled.  Later  we 
show  that  the  associated  increase  in  decoder  complexity  is 
not  as  great  as  might  be  expected,  and  we  give  an  example 
of  the  performance  increase  that  can  be  obtained  in 
manner. 

We  note  that,  instead  of  using  a nonbinary  block  code 
u the  outer  code,  one  can  use  a nonbinary  convolutional 
tode  in  a similar  concatenation  procedure.  One  such  code 
that  is  suitable  is  a dual-k  convolutional  code  [7],  For 
fIamPle,  a dual-5  code  can  be  used  as  the  outer  code  and 
■bt  #(20, 5)  block  code  used  as  the  u>ner  code. 

IV.  Upper  Bounds  on  the  Probability  of  Error 

In  this  section  we  present  upper  bounds  on  the  average 
8rDr  Probability  for  equal  energy  waveforms  constructed 
**  described  in  the  previous  section  that  are  transmitted 
Wer  a Rayleigh  fading  channel.  Some  examples  are  given 
10  Hlwtrate  the  gains  in  performance  achieved  by  the 
®tegrated  coding/modulation  approach  when  compared 
*|tb  conventional  M- ary  orthogonal  waveforms. 


For  reference,  we  give  an  upper  bound  on  the  error  rate 
performance  for  an  M-ary  orthogonal  waveform  set  with 
diversity.  Then  we  derive  an  upper  bound  on  the  probabil- 
ity of  error  for  a constant  weight  binary  block  code 
transmitted  via  MTOOK  over  a Rayleigh  fading  channel.2 
The  performance  of  the  concatenated  code  described  in 
the  previous  section  is  also  derived  and  evaluated.  Finally, 
we  give  the  error  rate  performance  of  binary  convolu- 
tional codes  in  terms  of  their  generating  function. 

Orthogonal  Waveforms 

Orthogonal  waveforms  with  diversity  D are  constructed 
by  assigning  D unique  (nonoverlapping)  cells  or  chips  to 
.each  waveform.  Thus  an  M~2k  signaling  alphabet  re- 
quires a total  of  DM  chips  for  each  waveform  in  the 
alphabet  to  have  diversity  D.  The  signal-tt>-noise  ratio 
(SNR)  per  chip  (yc)  is  related  to  the  SNR  per  bit  (y4)  by 
the  expression 


since  each  waveform  consists  of  D chips  and  conveys  k 
bits  of  information.  The  corresponding  bandwidth  expan- 
sion factor  is 


The  probability  of  a symbol  error  for  M-ary  orthogonal 
signaling  over  a Rayleigh  fading  channel  with  diversity  is 
given  in  closed  form  by  Hahn  [9].  The  expression  is  rather 
cumbersome  to  evaluate,  however,  especially  if  either  D or 
M (or  both)  are  large.  A union  bound  is  more  convenient. 
That  is,  for  the  set  of  M orthogonal  waveforms,  the 
probability  of  a symbol  (waveform)  error  can  be  over- 
bounded  as  follows: 

Pm<(M-1)Pz 

ortho  (Z))-(2*-l)Fi 

ortho  (0) 

<2*Fi<rt>0(£)  W 


where  Pz  onho  (Z>),  the  probability  of  error  for  two  orthog- 
onal waveforms  each  with  diversity  D,  is  [10] 


is  the  probability  of  eiror  for  binary  FSK  transmitted  over 
a Rayleigh  fading  channel  with  no  diversity.  For  large 
SNR’  #1  ortho  (D)  is  well  approximated  and  overbounded 


2Bounds  on  block  code  performance  for  1 Rician  rh.nnel  are  fives  in 
the  paper  by  Chaae  [8J. 


two  codewords  of  Hamming  distance  d.  Since  weight 
distributions  are  tabulated  for  many  codes  of  pracucal 
interest,  (12)  may  be  evaluated  exactly. 

As  an  alternative,  we  can  use  an  upper  bound  for 
P2(iJ),  and  hence  also  for  PM,  that  applies  to  any  con- 
stant weight  code  when  M is  not  too  large.  From  the 
discussion  given  in  the  Appendix,  it  follows  easily  that  the 
error  probability  fo  any  binary  decision  between  two 
constant  weight  codewords  x(  and  xJt  given  that  x,  was 
transmitted,  is 


' D '(2+t.)" 

Another  upper  bound  (the  Chemoff  bound)  on  the 
probability  of  error  for  binary  FSK  with  diversity  is  also 
available  [1],  namely 

^l«nho(^)<(V2)[4p(l~F)]Z>  (8) 

where  p is  given  in  (6).  A more  interesting  form  for  this 
bound  is  obtained  by  noting  that 


where 


The  function  f(yc)  reaches  a maximum  value  of  0.149  at 


where  Ar—logi  M information  bits  per  codeword  and  % is 
3 (5  dB).  This  is  the  well-known  result  [1]  that  the  the  SNR  per  bit. 
optimum  SNR  per  chip  for  M- ary  orthogonal  waveforms 
transmitted  over  a Rayleigh  fading  channel  with  diversity  Performance  Results 
is  approximately  5 dB. 

In  order  to  illustrate  the  performance  gains  that  can  be 
Waveforms  Constructed  from  Block  Codes  achieved  by  an  integrated  coding/modulation  approach 

and  soft-decision  decoding,  we  have  plotted  in  Fig.  2 
Consider  the  decoding  of  a waveform  corresponding  to  upper  bounds  on  the  performance  of  four  block  codes, 
a codeword  from  a constant  weight  code  of  size  M and  each  having  a bandwidth  expansion  factor  B£m 4.  The 
weight  w.  Again  we  denote  the  word  error  probability  by  code  parameters  for  the  Golay  (4€,  12)  code  and  the 
Pm  where  the  subscript  denotes  the  M hypotheses  among  Hadamard  #(20,5)  code  were  given  in  Section  III.  The 
which  the  decoder  must  choose.  Similarly  let  P ^ij)  de-  other  two  codes  were  obtained  by  expurgating  a (52, 17) 
note  the  probability  of  making  the  incorrect  choice  be-  code  [11]  to  yield  the  (52,13)  code  and  a first-order 
tween  the  hypotheses  i and  j.  Then  an  upper  bound  for  Reed-Muller  (16,5)  code  to  yield  the  (16,4)  code.  Their 
obtained  by  use  of  a union  bound  [1],  is  parameters  are  given  in  Table  VI. 

1 mm  The  performance  in  Fig.  2 is  given  in  terms  of  the  bit 

^>jv<']^2  2 ^jOJ)-  (11)  error  probability  Pb,  where  the  approximation  Ph— 

1 J*‘  (1  /2 )PM  has  been  used  to  derive  Pb  from  the  codeword 

For  the  channel  model  we  have  postulated,  P^iJ)  de-  error  probability  PM.  The  upper  bound  in  (14)  is  used  for 
pends  only  upon  the  distance  d between  codewords  i and 
j and  may  be  written  as  P2(d,  i).  Let  A(d,f)  be  the  number 
of  codewords  that  are  at  a distance  d from  word  i.  We 
then  write 

? m <*77  2 2 A(dtl)Pi(d,i).  (12) 


TABLE  VI 

Paxametexs  of  (52, 13)  Code  and  Fixct-Okdex  Reed- Mulls* 
Codes 


For  a linear  code  the  distance  distribution  for  any  code- 
word i is  the  same  as  the  weight  distribution.  The  depen- 
dence upon  i vanishes,  and  we  may  write  A(d)  for  the 
number  of  words  at  distance  d and  P2(d)  for  the  basic 
error  probability  involved  in  making  a decision  between 


Code  PircMttrk 

(52.13)  Cod* 

kood-tfellcr  Co 6* 

a 

52 

16 

k 

13 

6 

M 

13ttt 

30 

4a. 

16 

8 

V 

30 

8 
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Fig.  2.  Performance  of  several  codes  versus  conventional  diversity  tiM-hnirp*.. 


PM  in  conjunction  with  the  exact  form  for  Px 
«™/2)  given  in  (5). 

To  compare  the  performance  of  the  above  codes  with 
orthogonal  waveforms  employing  conventional  diversity, 
the  curves  for  binary  (M™2)  and  quaternary  {M —4) 
FSK,  each  with  dual  diversity  (Z>-2),  are  also  shown  in 
Fig.  2.  A comparison  of  the  curves  clearly  illustrates  the 
gains  in  performance  achieved  by  the  coding  approach. 
However  the  performance  can  be  further  improved  by 
concatenation  as  we  now  show. 

Concatenated  Code 

We  consider  the  performance  of  an  [IV,  V — \\n,k)  con- 
catenated code.  We  assume  the  outer  alphabet  srie  q and 
the  number  of  inner  codewords  to  be  2*.  The  number  of 
outer  codewords  1 is  too  large  for  the  union 

bound  in  (14)  to  be  appropriate.  However,  as  the  [N,N— 
1]  outer  code  is  linear  (over  the  field  GF(q)),  its  exact 
distance  distribution  may  be  used  to  evaluate  (12).  The 
weight  (or  distance)  distribution  of  the  Reed-Solomon 
[IV, IV- 1]  code  can  be  determined  as  a special  case  of  the 


results  given  for  maximal  distance  separable  codes  [11], 
namely, 

d- 1,2,  •••,1V. 

(16) 

Note  that  A(l)—0  in  accord  with  a minimum  distance  of 
2 for  the  outer  code. 

Since  a distance  between  outer  codewords  of  d symbols 
is  equivalent  to  a distance  of  at  least  d-d^  chip  positions 
between  the  corresponding  waveforms,  we  have 

07) 

Finally,  the  SNR  per  bit  must  be  modified  to  account  for 
the  energy  used  to  transmit  the  outer  parity  symbol.  Thus 

(18) 


is  used  in  place  of  (15).  By  substituting  (16) — (18)  (5) 

into  (12)  we  obtain  the  desired  bound  on  PM.  If  this 
bound  is  evaluated  for  increasing  values  of  yh,  we  observe 
that  all  terms  other  than  the  d*2  term  rapidly  become 


SKR  p«r  Bit  in  dB 

Fig.  3.  Improvement  in  performance  resulting  from  concatenation. 


insignificant.  For  error  rates  of  practical  interest  we  may 
then  write 


In  Fig.  3 we  repeat  the  performance  of  the  #(20,5) 
code  given  in  Fig.  2 and  also  show  the  performance  of  the 
concatenated  code  that  is  obtained  by  using  the  #( 20,5) 
as  the  inner  code  with  a [9,8]  outer  code.  At  a bit  error 
rate  of  10” 3 the  concatenated  code  is  seen  to  require 
about  4 dB  less  SNR.  Considering  that  the  bandwidth 
expansion  is  increased  from  4.0  to  only  4.5,  this  is  clearly 
a dramatic  improvement  in  performance. 

We  stated  earlier  that  the  choice  of  the  outer  code 
length  N was  quite  arbitrary.  We  now  consider  the  effect 
of  varying  this  parameter.  Using  the  H( 20,5)  as  an  inner 


code,  error  rates  were  calculated  for  choices  of  N from  2 
to  25.  These  are  plotted  in  Fig.  4 showing  the  bit  error  rate 
as  a function  of  N with  the  SNR  per  bit 
held  fixed.  It  is  seen  that  there  is  an  optimal  range  of 
values  for  N. 

From  (18)  we  note  that  for  small  N,  an  excessive 
fraction  of  the  total  signal  energy  is  expended  in  the  parity 
symbol.  This  explains  the  rapid  deterioration  in  perfor- 
mance that  can  be  observed  as  N decreases  towards  its 
minimal  possible  value  of  2.  As  N increases,  the  ratio 
N/(N—l)  becomes  constant.  Thus  (19)  predicts  a 
quadratic  increase  in  error  rate  with  N.  Between  these 
extremes  there  is  an  optimal  choice  of  N?  In  this  case,  the 
curves  reach  a minimum  for  A™ 4.  The  minimum  is 
relatively  broad,  however,  so  that  the  choice  of  N*  9 in 
the  above  example  results  in  only  a minor  degradauon  in 
performance. 

Jlt  should  be  observed  that  the  value  of  the  mitumiim  and  the  shape  of 
the  curve  depend  in  pan  on  the  formula  used  to  convert  from  codeword 
error  probability  to  bit  error  probability.  Our  choice  ien<t.  to  bias  the 
minimum  toward  smaller  values  of  N than  some  other  conversion 
formulas  such  as,  for  example,  the  formula  P^d^/  n)PM. 
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Fig.  4.  Performance  of  concatenated  code  as  function  of  outer  word  length. 


Binary  Convolutional  Codes 

Before  concluding  this  section  on  the  performance  of 
codes  on  a Rayleigh  fading  channel,  we  briefly  mention 
that  an  upper  bound  on  the  error  rate  performance  of 
binary  convolutional  codes  with  maximum  likelihood  de- 
coding can  be  obtained  by  using  the  generating  function 
of  these  codes  as  described  by  Viterbi  [3].  For  binary  FSK 
transmission  of  each  bit  with  Lth  order  diversity  (or 
MTOOK  transmission  of  the  code  modified  as  described 
to  Method  1 of  Section  III),  an  upper  bound  on  the  bit 
error  probability  is 


P><  2 ckPZonbo(kL)  (20) 

where  onbo  (kL)  is  given  by  (5)  with 

T,— (21) 

The  weighting  coefficients  [ck)  in  the  summation  are 
obtained  from  the  first  derivative  of  the  generating  func- 
bon  of  the  code,  as  described  in  [3],  4*.  is  the  minimum 


free  distance  of  the  code,  and  R is  the  rate  of  the  (unmod- 
ified or  original)  code.  Therefore,  the  bandwidth  expan- 
sion factor  is  BE-2L/R. 

V.  Efficient  Decoding  of  High-Rate 
Concatenated  Codes 

In  the  last  section  we  presented  a [9,8]//(20,5)  code  as 
an  example  of  a concatenated  code.  Here  M—i10,  and  full 
maximum  likelihood  decoding  of  one  codeword  would 
seem  to  involve  a prohibitive  number  of  operations.  In 
contrast  individual  decoding  of  the  eight  inner  codewords 
(which  would  occur  if  there  were  no  outer  code  and  which 
would  yield  the  same  information  content  as  the  decoding 
of  one  concatenated  word)  requires  eight  decodings,  each 
with  M «■  32,  for  the  formation  of  a total  of  only  28  terms. 

The  problem  of  implementing  maximum  likelihood  de- 
coding for  very  large  codes  is  not  a new  one.  Several 
previous  attempts  have  been  made  to  design  receivers  that 
are  not  strictly  maximum  likelihood,  but  whose  perfor- 
mance is  almost  as  good.  Various  techniques  for  subop ti- 
mal  decoding  have  been  proposed  by  Weldon  [12],  Forney 
[13],  Chase  [14],  Dorsch  [15],  and  Wain  berg  and  Wolf  [16]. 
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Recently  a new  technique  has  been  proposed  [17]  for 
achieving  true  maximum  likelihood  decoding  of  linear 
block  codes  using  a decoder  whose  complexity  is  governed 
by  the  number  of  parity  symbols  (rt  — k)  rather  than  by 
the  number  of  information  symbols  (k).  This  technique  is 
closely  related  to  the  Viterbi  algorithm  [3],  [18]  which  has 
previously  been  used  for  the  decoding  of  convolutional 
codes.  It  is  particularly  suited  to  the  decoding  of  the 
concatenated  codes  described  in  the  previous  section.  As 
shown  below,  application  of  this  technique  to  a con- 
catenated code  with  a very  large  number  of  codewords 
results  in  a maximum  likelihood  decoder  whose  complex- 
ity is  not  much  greater  than  that  required  for  direct 
decoding  of  the  inner  code  alone. 

As  stated  earlier,  we  are  specifically  considering  an 
[N,N—  1]  outer  code.  The  size  of  the  inner  code  is  the 
same  as  q,  the  size  of  the  outer  code  alphabet  The 
technique  for  decoding  a concatenated  word  can  best  be 
visualized  as  a two-level  soft-decision  procedure,  involving 
an  interactive  decoding  of  the  inner  and  outer  codes. 

Maximum  likelihood  decoding  of  any  codeword  is  a 
soft-decision  process  in  that  individual  (hard)  bit  decisions 
are  not  initially  made.  That  is,  the  values  of  matched  filter 
outputs  are  maintained  until  the  final  decision  is  made  as 
to  the  most  likely  codeword.  Similarly  the  soft-decision 
concept  is  extended  to  cover  the  two  levels  of  coding  that 
form  the  concatenated  structure.  As  each  symbol  of  the 
outer  code  (i.e.,  an  inner  codeword)  is  received,  the  q 
log-likelihood  terms  corresponding  to  it  are  formed  as  in 
(1).  However,  we  do  not  select  the  greatest  of  these  terms; 
rather,  we  use  them  to  form,  in  turn,  a higher  level  set  of 
log-likelihood  terms  corresponding  to  hypotheses  on  the 
outer  coaeword  The  key  to  this  technique  is  that  it  is  not 
necessary  to  evaluate  hypotheses  for  each  of  the  qN~' 
possible  outer  codewords;  instead  only  q hypotheses  need 
be  considered 

The  outer  code  log-likelihood  terms  are  developed  as 
each  symbol  is  processed  They  represent  the  q possible 
values  of  the  sum  (modulo-^)  of  the  symbols  currently 
received.  For  each  of  these  hypotheses,  the  most  probable 
sequence  of  symbols  satisfying  the  symbol  sum  constraint 
is  determined  The  most  probable  sequence  is  simply  the 
one  with  the  greatest  arithmetic  sum  of  inner  code  log- 
likelihood  values.  The  actual  sequence  of  symbols  is 
stored  in  an  associated  path  vector.  As  successive  symbols 
are  received  the  log-likelihood  terms  and  path  vectors  are 
appropriately  updated  After  the  last  symbol  in  the  word 
the  parity  constraint  for  the  outer  code  demands  that  all 
hypotheses  except  the  zeroth  hypothesis  be  discarded  The 
most  likely  outer  codeword  is  then  specified  directly  by 
the  symbols  stored  in  the  zeroth  path  vector. 

This  decoding  procedure  may  be  compactly  expressed 
by  the  following  algorithm.  We  define  the  inner  code 
log-likelihood  term  for  the  ith  symbol  hypothesis 
evaluated  for  the  y'th  symbol  to  be  v(ij),  0< ; < <?  - I, 
1 < j < S.  Similarly  the  (partially  developed)  outer  code 
log-likelihood  term  for  the  fcth  modulo-^  hypothesis 
evaluated  over  the  first  j symbols  is  0<  k < q—  1, 


0<j<N.  The  path  vector  </*(kJ)>  contains  the  values 
(indices)  of  the  j symbols  associated  with  V(kJ).  The 
decoding  algorithm  is  the  following. 

1)  Initialize  y— 0; 

set  V(kj)— 0,  k — 0,  and  — oo,  k> 0; 
clear  <-P(*J)>. 

2) j*-j+l. 

3)  If  j — N,  go  to  step  6),  or  else  proceed 

4)  Do  the  following  for  all  values  of  k : 

a)  compute 

v(kJ )-  max  [ V((k-i)aoi<iJ-l)  + v(ij) ] 

where  (k - f) f is  the  integer  (k-i)  if  k>i  or 

the  integer  (k  - j + q)  if  k </'; 

b)  form  a new  path  vector  by  appending  the  index  i 
that  maximizes  the  above  to  the  old  path  vector 
associated  with  j, 

<P(kJ)>-<?(iJ-l),/>. 

5)  Return  to  step  2). 

6)  Perform  steps  4a)  and  4b),  but  only  for  k-0. 

7)  Stop. 

As  an  example,  we  illustrate  this  algorithm  for  the  case 
of  W-7  and  q -3.  Assume  the  inner  code  correlator 
outputs  v(ij)  are  already  formed  and  are 
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The  development  of  the  outer  code  log-likelihood  terms 
and  resultant  path  sequences  is  shown  below. 


Decoding  Procedure 

a)  Initially: 

K(0,0)-0 
K(l,0)--oo 
y(2,0)»  — eo. 

b) y'-l: 

V(0,  1)  — max  (0+1, -oo+6, -oo+4)- 1 

Pith  o 

max  (—00  + 1,0+6, —00+4)  — 6 
Pith  l 

F(2,l)  — max  (— oo  + 1,  — oo  + 6,0+4)— 4. 
Pith  2 

c) y-2: 

P,(0,2)-max  (1  + 1,4  + 2,6+4)- 10 

Pith  i—i 

K(l,2)-max  (6+ 1. 1 +2,4+4)  — 8 

Pith  2—2 

^(2,2)  — max  (4+ 1,6 + 2,1  +4)  — 8. 

Pith  ■— 1 
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d) y-3: 

V(0, 3)-  max  (10+1,8  + 3,8+4)- 12 

Path  1—2—2 

K(l,3)  — max  (8+ 1, 10+3,8+4)- 13 

P«th  l— *2-*l 

K(2,3)-  max  (8+ 1,8+3, 10+4)- 14. 

Path  1-2—2 

e) ;-4: 

F(0,4)-  max  (12+4, 12+ 1, 13+ 1)- 16 

Path  2— 2— 2—0 

K(l,4)—  max  (13+4, 12+ 1, 12+ 1)- 17 

Path  1—2— 1—0 

K(2,4)-  max  (14+4, 13+ 1, 12+ 1)- 18. 

Path  1—2— 2—0 

t)j-5: 

F(0,5)-  max  (16  + 2, 18+4, 17+ 1)  — 22 

Pai  1 -2-2-0- 1 

K(1.5)«  max  (17+2, 16+4, 16+  1)  — 20 

Path  2—2— 2— 0—1 

F(2.5)-  max.  (16+2, 17+4, 16+  1)  — 21. 

Path  I— 2—1— 0—1 

g) y-6: 

F(0,6)»  max  (22  + 4,21  + 1,20+2)-26 

Path  1— 2-2-0— 1-0  ' 

F(l,6)-  max  (20+4,20+ 1,21 +2) -24 

Path  2— 2— 2—0— 1—0  ’ 

K(2,6)-  max  (21+4,20+ 1,20+2)-25. 

^ 1—2— 1—0— l— 2 

h) j-7: 

^(0,7)-  max  (26  + 3,25  + 1,24  + 8)-32. 

The  most  likely  outer  codeword  is  directly  determined 
from  the  symbols  stored  in  the  zeroth  path  vector  at  the 
end  of  the  procedure:  (2,2,2, 0, 1,0,2).  We  note  that  after 
the  j — 3 calculations,  all  path  vectors  have  as  their  second 
symbol  the  value  2.  Thus  at  'his  point  a “hard”  decision 
has  been  made  on  the  second  symbol.  Similarly,  after  j — 4 
he  fourth  symbol  is  fixed  at  the  value  0,  and  after  5 
the  fifth  symbol  is  fixed  at  1. 

The  decoding  algorithm  as  presented  here  involves 
some  awkward  and  time-consuming  manipulations  involv- 
ing the  path  vectors.  It  was  presented  in  this  manner 
merely  (ot  the  sake  of  clarity.  An  equivalent  procedure 
where  the  path  vectors  need  not  be  moved  and  the  symbol 
indices  are  directly  appended  to  them  can  be  used  instead. 
(At  the  end  of 1 this  decoding  procedure,  the  outer  code- 
word is  not  directly  available  but  must  be  extracted  from 
the  set  of  all  path  vectors.)  In  this  case  the  updating  of  the 
path  vectors  for  each  symbol  processed  has  almost  no 
impact  upon  the  total  complexity  of  the  algorithm.  The 
complexity  of  the  algorithm  is  then  effectively  that  of 
updating  all  the  V(k,j)  in  step  4a).  requiring  the  computa- 
tion and  comparison  of  a total  of  q2  terms. 

To  determine  the  increase  in  decoding  complexity  due 
to  concatenation  (relative  to  use  of  a single  code),  we  first 
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note  that  either  scheme  requires  the  formation  of  the  set 
of  inner  code  log-likelihood  terms.  There  are  q of  these, 
each  of  which  is  the  sum  of  w of  the  square-law  detected 
matched  filter  outputs.  The  complexity  of  forming  these 
terms  can  be  assumed  to  be  proportional  to  the  number  of 
additions  involved,  which  is  q(w  — 1).  For  the  con- 
catenated code,  we  must  next  form  q2  terms;  however, 
these  outer  code  log-likelihood  terms  are  each  the  sum  of 
only  two  inner  code  terms.  The  extra  computation  is  then 
roughly  proportional  to  only  q2  additions,  for  a total 
computational  measure  of  q(q  + w — 1).  The  increase  in 
decoding  complexity  is  seen  to  be  a factor  of  roughly 
[l  + q/(w— 1)J,  where  the  exact  value  depends  upon  the 
particular  processor  chosen.  This  factor  is  even  smaller 
when  the  formation  and  detection  of  the  matched  filter 
outputs  are  considered.  For  our  previous  example  using 
the  if  (20, 5)  inner  code,  the  increase  in  decoding  complex- 
ity associated  with  concatenation  is  at  most  roughly  4.5. 
We  conclude  that  the  decoding  algorithm  presented 
makes  the  use  of  concatenated  codes  a feasible  means  of 
constructing  constant  average  energy  waveforms  as  well 
as  an  attractive  method  for  improving  the  performance  of 
a communications  system  with  only  a single  level  of 
coding. 

Appendix 

Minimum  Distance  and  Order  of  Diversity  for 
Constant  Weight  Binary  Block  Codes 

Let  x and  y be  two  words  from  a binary  block  code  of  word 
length  n.  Let  w(x)  and  w(>>)  be  the  weights  of  x and  y,  respec- 
tively. Define  d,j(x,y ) for  ij  — 0, 1 as  follows. 

4»(x,.p)— number  of  positions  m which  x has  a 0 and  y has  a 

0. 

4j,(x,j>)- number  of  positions  in  which  x has  a 0 and  y has  a 

1. 

diofx.y’)— number  of  positions  in  which  x has  a 1 and  y has  a 

0. 

4|  i (Jc.P)- number  of  positions  in  which  x has  a 1 and  y has  a 
1. 

Note  that 

‘W-r.P)  + 4)i  (x,.p)  + d^x.y)  + 4,  ,(x,y>- n 

4)i  (x,y)  +4„(x,y>)-  *iy) 

^io(x,y)  + 4II(x,j>)-H<x) 
4„(x,j>)+410(  x,.>>)  -d^x.y) 

where  4H(x,y)  is  the  (Hamming)  distance  between  x and  y, 
defined  by  w(x@y)  where  © is  the  “exclusive  or”  function 
(equivalently,  addition  over  the  binary  field).  For  a constant 
weight  code,  we  have 

»(x)-w(y)~di„(x,y)-dtc{x,y)-idH(x,y). 

Consider  the  entire  code  consisting  of  M codewords  x,,Xj,--\ 
xM.  One  can  argue  that  the  “effective  order  of  diversity"  D,  of 
the  code  is  given  by  the  formula 

D,—  min  [4l0(x„x,)]. 

This  follows  from  the  argument  that  the  decision  between  two 
codewords  x,  and  Xj,  given  that  x,  was  transmitted,  is  equivalent 
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to  making  a binary  decision  between  two  hypotheses  H,  and  H, 
where  each  hypothesis  involves  dy^x,,Xj)  terms  that  under  H,  are 
identically  distributed  signal  plus  noise  components  rnA  that 
under  Ht  are  noise  components.  The  number  dy<^x,,Xj)  is  there- 
fore the  effective  order  of  diversity  in  this  decision  process.  Fpr  a 
code  with  a variable  distribution  of  weights,  the  effective  diver- 
sity in  reception  will  vary  with  the  codeword  transmitted  and, 
therefore,  some  decisions  will  be  made  with  more  likelihood  of 
error  (lower  diversity  available)  than  others.  This  leads  to  a 
biased  decoding  process  whose  performance  is  hard  to  define 
and  which  intuitively  can  be  expected  to  be  poorer  than  in  a 
more  balanced  situation. 

We  now  consider  a fixed  weight  code,  for  which 

D.m\  [4r(*^)]. 

i 

Since  the  last  expression  just  defines  the  minimum  distance  of 
the  code,  it  follows  that 
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L Introduction 

A two-user  multiple-access  adder  channel  with  binary  inputs 
s a channel  with  two  inputs  x,  and  Xj,  (x„x2e{0,l)),  a single 
output  y O’  e {0, 1,2}),  where  y is  given  as 
y-x  i+x2 

tad  + indicates  addition  of  real  numbers.  As  functions  of  time 
he  input  and  output  can  be  either  sequences  of  symbols  or 
nvefonns. 

Two  independent  message  streams  are  to  be  transmitted  over 
t las  channel  one  via  each  input  port  As  shown  in  Fig.  1,  the 
ccoden  for  these  inputs  are  separate  devices  that  are  assumed 
a act  on  their  respective  message  streams  without  cooperation. 
The  decoder's  task  is  to  observe  the  channel  output  and  to 
hithfully  reproduce  the  two  message  streams  (or  equivalently 
ie  two  channel  inputs).  The  task  of  the  decoder  is  made 
difficult  by  the  ambiguity  when  a one  is  received;  this  could 
hive  been  produced  either  by  the  input  pair  (xi.x^-fO,  1)  or  by 

(*i.*2)-(1.0). 

The  rate  of  transmission  of  the  <th  encoder  R<  is  defined  to  be 
it  ratio  of  the  number  of  binary  digits  in  the  ith  message  stream 
» the  number  of  binary  digits  in  the  output  of  the  ;th  encoder, 
•» 1,2.  The  capacity  region  for  the  channel  is  defined  as  the 
w of  rate  pairs  (f?,,  Jij),  which  allow  an  arbitrarily  small  error 
probability  in  the  decoder  output  sequences.  The  capacity  region 
lor  this  channel  is  given  in  Fig.  2 and  was  derived  [ 1 ] — [3]  under  a 
attain  assumption  regarding  the  synchronization  of  the  en- 
ters and  of  the  encoders  and  the  decoder. 

The  basic  assumption  of  a multiple-access  channel  is  that  the 
acoden  are  to  operate  independently  of  each  other.  Yet  in  the 
im  derivations  of  the  capacity  region  for  this  channel  it  was 
turned  that  the  encoders  utilized  block  codes  and  that  the 
laco dm  produced  codewords  that  were  in  block  and  bit  synchro- 
Furthermore  it  was  assumed  that  the  decoder  was  in  block 
txl  bit  synchronism  with  the  encoders.  Recently  [4]  it  was 
-°wn  that  the  same  capacity  region  applies  when  the  assump- 
303  of  block  synchronism  between  encoders  is  dropped.  How- 
o'tr.  bit  synchronism  between  encoders  was  still  assumed  as  well 
* block  synchronism  between  the  decoder  and  one  encoder. 


Our  aim  is  to  present  some  very  simple  codes  that  can  be  rsed 
when  synchronism  is  not  present  We  begin  by  retaining  the 
assumption  of  bit  synchronism.  Later  we  will  show  that  the 
codes  presented  perform  equally  well  with  or  without  bit  syn- 
chronism. We  will  always  be  concerned  with  codes  that  yield 
zero  probability  of  error  after  synchronization  is  achieved. 

IL  Some  Codes 

We  first  consider  a class  of  block  codes  for  both  encoders. 
Encoder  1 uses  two  codewords  of  block  length  K,  the  all-zero 
word  and  the  all-one  word.  The  rate  for  this  encoder  is  thus 
1 / K.  Encoder  2 uses  all  binary  codewords  of  length  N such  that 
1)  the  first  symbol  is  a zero  and  2)  the  word  does  not  contain  K 
consecutive  ones.  We  will  later  compute  the  rate  of  this  code, 
but  first  we  show  that  such  codes  lead  to  zero  probability  of 
error. 

Since  we  have  assumed  bit  (but  not  block)  synchronism,  the 
channel  output  is  a sequence  of  symbols  from  the  alphabet 
{0, 1,2).  The  decoder  examines  the  received  sequence  until  a run 
of  exactly  K consecutive  nonzero  symbols  has  been  received. 
That  13,  the  symbol  preceding  this  run  and  the  symbol  im- 
mediately following  this  run  must  be  a zero.  Such  a run 
result  only  from  Encoder  1 sending  the  ail -one  word  in  block 
synchronism  with  this  run.  Since  such  an  output  sequence  will 
occur  with  probability  one,  the  decoder  can  block  syn- 
chronism with  Encoder  I.  The  decoder  then  examines  the  entire 
channel  output  sequence  through  a -symbol  window  that  is 
in  block  synchronism  with  Encoder  1.  All  K bits  of  word 
will  be  nonzero  if  and  only  if  Encoder  1 transmitted  the  all -one 
word  during  that  block.  The  decoder  has  now  determined  the 
channel  input  sequence  supplied  by  Encoder  1.  The  decoder 
completes  the  decoding  by  subtracting  this  sequence  from  the 
received  sequence  to  obtain  the  sequence  produced  by  Encoder 

It  should  be  noted  that  the  same  codes  and  decoding  algo- 
rithm will  work  even  without  bit  lynhtomsm.  Now  each  encoder 
is  assumed  to  produce  pulses  of  height  zero  or  one  and  duration 
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looking  for  a waveform  that  is  nonzero  for  exactly  XT0  seconds. 
Such  a waveform  results  only  if  Encoder  1 has  transmitted  the 
all-one  word  in  block  synchronism  with  this  KT0  second  segment 
of  the  output  waveform.  The  decoder  now  has  established  block 
synchronism  with  Encoder  1.  Looking  through  a window  of 
duration  AT0  seconds,  the  decoder  can  determine  without  error 
the  waveform  transmitted  by  Encoder  1.  Subtracting  this  wave- 
form from  the  received  signal  results  in  the  waveform  produced 
by  Encoder  Z Block  synchronization  to  the  codewords  of  Code 
2 can  be  achieved  by  many  different  means  at  a negligible  rate 
lots. 

To  determine  the  rate  of  Encoder  Z we  give  a constructive 
procedure  for  producing  all  the  codewords  satisfying  the  two 
previously  mentioned  conditions.  Suppose  we  fix  K and  have 
already  produced  sets  of  codewords  satisfying  he  conditions  of 
block  length  1,Z-  ■ • ,,N- 1.  We  wish  to  find  a code  of  block 
length  N.  Let  us  append  a prefix  of  a zero  followed  by  (i  — 1) 
ones  to  the  codewords  of  block  length  N- i for  »— 1,2,- ■ ■ ,K 
and  consider  the  entire  collection  of  words  formed  in  this 
fashion.  It  can  be  verified  that  the  sequences  that  result  will 

a)  all  be  distinct, 

b)  all  begin  with  a zero, 

c)  have  no  run  of  AT  consecutive  ones,  and 

d)  form  all  the  codewords  of  block  length  N with  the  desired 
properties. 

Let  GAN)  be  the  number  of  codewords  of  length  N.  Then  from 
the  above  construction,  we  have 

GAN  )-  GK(N  —Y)+  GK{N —2)+  ■ ■ - + Gk(N-X). 

Such  constrained  binary  sequences  have  been  considered  previ- 
ously J5H7]. 

By  standard  techniques  for  solving  linear  difference  equations, 
G*<JV)  is  given  by 


discussed  above  for  2 is  as  follows.  Let  Encoder  1 use  tie 
two  codewords  {00,11}.  The  rate  of  the  code  is  *,-05.  Le 
Encoder  2 use  the  two  codewords  {0,01}.  Such  a code  will  one 
exhibit  a run  of  two  consecutive  ones  so  that  the  same  decodmj 
algorithm  can  be  used  as  for  the  block  code  with  AT-ZTwo 
different  rates  can  be  calculated  for  Encoder  Z If  the  F™-^  j 
uses  the  two  codewords  with  equal  probability,  the  rale  £,• 
(average  length  of  code)- 1 -0.66.  Otherwise,  Encoder  2 used  the 
codeword  0 with  probability  p,  the  average  rate  of  the  code  a 
^j“[~ plo&iP—  (1- *)logi  (1~ *)]/(! -P+2(1  — p)\  The  nan- 
mum  rate  occurs  for  p-(V 5 - l)/2  or  max,  *2- 0.694.  The 
resulting  rate  sum  is  thus  the  same  as  that  of  the  limiting  rale  U 
the  block  code.  These  results  are  tabulated  as  follows. 


Variable  rate  codes  can  be  produced  that  are  analogous  to  the 
block  codes  for  AT>Z 

Conclusion 

It  is  natural  to  consider  codes  that  do  not  require  synchranci 
don  for  the  binary  input  adder  channel  for  L users  where  L>1 
The  codes  that  have  been  found  do  not  compare  well  with  the 
maximum  sum  of  the  rales  predicted  by  the  capacity  rtpoa  for 
the  channel,  which  for  large  L is  approximately  [8] 

+ +*i)»„wylog2(weZ./2). 

It  remains  an  open  problem  to  find  good  codes  that  do  not 
require  synchronization  for  Z>Z 


where  the  X,  are  the  K solutions  to  the  characteristic  equation 
X*-A*-,+Xr-J+"-  +X+1. 

For  large  N,  GAN)  is  then  asymptotically  given  by 

Gr(*)«CX£, 

when  A —max  A..  The  limiting  rate  of  the  codes  for  large  N is 

thgn 

* _ WiGr , 


Since  A„„— »2  as  AT-*  ao,  X2-»l  for  very  large  K.  A table  of  the 
limiting  rates  for  small  values  of  XT  is  as  follows. 


[3]  C V.  Freeman  an 

The  maximum  sum  is  obtained  for  AT— 3.  mput  rwncted  ch 

It  is  also  of  interest  to  examine  bow  the  sum  of  the  rates  varies  161 
as  A increases  and  K is  fixed.  This  growth  in  total  rate  is  plotted  pj  o.  T.  Tans  and 
m Fig.  3 for  XT  — Z no—efems  channel!. 

An  alternative  coding  technique  is  to  allow  one  or  both  1*1  •*.  K-  W°U.  "Muii 
encoder*  to  utilize  a variable  length  code.  One  particular  version  ^ ’mTsiv&T 

of  such  a code  that  has  the  tame  properties  as  the  block  codes  bon  intcnatxoaL 
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05 

0694 

1.19 
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003 
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025 
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OJC 
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equal  probability  codewords 
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0666 
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A SHORTENED  VI1ERBI  DECODING  ALGORITHM 
FOR  TERMINATED  RATE  1/N  CONVOLUTIONAL  CODES  WITH  HARD  DECISIONS* 

By 

Dev  V,  Gupta** 
and 

Jack  Keil  Wolf*** 

ABSTRACT 

An  algorithm  for  maximum  likelihood  decoding  of  terminated  rate  1/N 
convolutional  codes  with  hard  decisions  is  presented  which  is  based  upon, 
but  is  simpler  than^the  Viterbi  algorithm.  The  algorithm  makes  use  of  an 
algebraic  description  of  convolutional  codes  introduced  by  Massey.  For 
reasonable  values  of  the  probability  of  error  the  algorithm  is  shown  to 
produce  substantial  savings  in  decoding  complexity  as  compared  with  the 
Viterbi  algorithm. 
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I . INTRODUCTION 

§ 

A decoding  algorithm  is  developed  for  terminated  rat 
codes  which  is  based  upon  an  algebraic  description  of  sucl 
by  Massey  [1].  This  algorithm  can  be  applied  to  a receivi 
components  from  the  same  alphabet  as  the  transmitted  code 
ithm  uses  the  Viterbi  [2]  decoding  algorithm  as  an  essenti 
it  is  simpler  than  directly  applying  the  Viterbi  algorithit 
manner. 

The  basic  steps  in  the  algorithm  are  as  follows: 

Step  1.  A code  word  which  is  easily  calculated  from  the  r 
is  subtracted  from  the  received  vector  leaving  a 
ends  in  a stream  of  zeros. 

Step  2.  The  Viterbi  decoding  algorithm  is  applied  to  the  \ 
in  Step  1 resulting  in  a tentative  code  word.  Sir 
to  be  decoded  ends  in  a stream  of  zeros,  a short  c 
applied  to  the  Viterbi  decoding  algorithm  to  prodi 
word. 

Step  3.  The  code  word  used  in  Step  1 is  added  to  the  tente 
word  found  in  Step  2 to  yield  the  maximum  likelihc 
The  savings  in  decoding  complexity  occurs  in  Step  2 wi 
cut  is  applied  to  the  Viterbi  algorithm.  In  this  step,  the 
decoding  algorithm  is  applied  until  one  comes  to  the  strinc 
zeros.  From  that  point,  the  algorithm  immediately  produces 
code  word.  The  efficiency  of  this  technique  depends  upon  1 
the  terminating  string  of  zeros.  We  will  show  that  the  ler 
string  of  zeros  is  no  less  than  the  number  of  error  free  di 
at  the  end  of  the  transmission  of  the  terminated  convolutic 
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2 

1 1 . FORMULATION  AND  NOTATION 

Let  the  polynomial  representation  of  a sequence  of  elements  from 
GF(q) , aQ,  a1 , a£  be  a(x)  = aQ  + a1  x + ...  + a^x*.  Here  x is  an 
indeterminate  whose  powers  indicate  the  ordering  of  the  element  in  the 
original  sequence. 

A terminated  convolutional  code  over  GF(q)  of  rate  1/N  has  as  its 
code  words,  all  sequences  of  elements  from  GF(q)  whose  polynomial  repre- 
sentation  is  of  the  form  C(x)  = a(x  ) g(x).  Here  g(x)  is  a fixed  poly- 
nomial of  degree  r and  a(x)  is  any  polynomial  of  degree  k or  less.  The 
coefficients  of  g(x)  are  determined  by  the  tap  connections  of  the  en- 
coder  and  the  sequence  of  elements  corresponding  to  the  polynomial  a(x) 
can  be  considered  as  the  information  sequence  which  drives  the  encoder. 

The  method  by  which  the  polynomial  g(x)  can  be  obtained  from  the  encoder 
can  be  deduced  from  Figure  1.  Here  g.(x),  i = 1,2,  ....  N are  the  im- 
: pulse  responses  of  finite  memory  filters,  the  outputs  of  which  are  multi - 

i plexed  to  produce  the  code  word.  We  will  assume  that  gN(x)  has  maximum 

[ degree  of  all  such  filters:  that  is 

v = deg  [gN( x) ] > deg  [g.(x)]  , i = 1,2,  ....  N , 

where  deg  [ ] indicates  the  degree  of  the  polynomial  contained  in  the 

{ square  bracket.  Thus 

r = deg  [g(x)]  = N v + N-l  . 

We  note  that  we  are  dealing  with  terminated  convolutional  codes,  which 
are  a special  form  of  linear  block  codes.  The  maximum  degree  of  C(x)  is  then 


max  deg  [C(x)]  = Nk  + r = N(k+v+l)  - 1 
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where  the  maximum  is  taken  over  all  input  sequences  a(x).  The  block 
length,  n,  of  the  resultant  codes  is  then 

n = N(k+v+l ) . 

The  linearity  property  of  the  codes  will  be  utilized  in  a later  development. 
Specifically,  if  C-|(x)  and  C2U)  are  polynomials  corresponding  to  any  two 
code  words  and  if  a and  6 are  any  elements  from  GF(q),  then  a.  C^(x)  + 

0 i^x)  is  a polynomial  corresponding  to  a code  word. 

Again  referring  to  Figure  1,  the  encoding  procedure  used  to  produce  a 
terminated  code  is  to  input  (k+1)  information  symbols  (corresponding  to  the 
coefficients  of  a(x))into  the  encoder  and  then  input  a string  of  v zeros  to 
clear  the  memory  of  the  filters.  This  string  of  input  zeros  should  not  be 
confused  with  the  stream  of  zeros  referred  to  in  Step  2 of  the  decoding 
algorithm.  They  are  different  creatures. 


Figure  1.  Encoder  for  Convolutional  Code 
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III.  THE  SHORTENED  DECODING  ALGORITHM 

We  assume  that  corresponding  to  a transmitted  code  polynomial  C(x), 
we  have  a received  polynomial  R(x)  whose  coefficients  are  from  GF(q).  We 


define  the  error  polynomial  E(x)  for  a given  received  R(x)  and  transmitted 
C(x)  as 

E(x)  k R(x)  - C(x). 

The  decoder  observes  only  the  received  polynomial  R(x)  and  must  deduce 
which  code  word  was  most  likely  to  have  been  transmitted.  For  a channel 
whose  errors  are  independent  of  the  symbols  transmitted,  the  decoder  must 
choose  the  most  probable  error  pattern  based  upon  observing  R(x).  Any  de- 
coding algorithm  which  results  in  the  most  probable  code  word  is  a maximum 
likelihood  decoding  algorithm.  The  Viterbi  decoding  algorithm  is  known  to 
be  such  an  algorithm. 

Sometimes  ties  occur  in  decoding  in  that  several  code  words  have  the 
same  maximum  conditional  probability.  In  that  case  a maximum  likelihood 
decoder  can  produce  any  one  of  these  code  words.  In  the  discussion  to 
follow,  this  situation  will  be  ignored  and  we  will  always  assume  the  de- 
coder produces  a unique  maximum  likelihood  code  word.  However,  our  pro- 
cedure works  whether  ties  occur  or  not. 

Let  a maximum  likelihood  decoding  algorithm  decode  R(x)  to  the  code 
word  C*(x).  Let  R'(x)  = R(x)  - C(x)  where  C(x)  is  any  code  word  in  the 
truncated  convolutional  code.  Then  we  have 


Letwna  1 : A maximum  likelihood  decoding  algorithm  will  decode  R (x)  to  the 
code  word  C*(x)  - C* (x) . 
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Proof:  Let  C^(x)  , i = 1,2,  ....  qk+\  be  the  qk+^  code  words  in  the 
truncated  convolutional  code.  Having  received  R(x),  the  decoder  chooses 
the  most  probable  error  pattern  from  the  set 


<R(x)  - C.(x)}^=1  . 

By  assumption  this  is  the  error  pattern  R(x)  - C*(x).  Having  received 
R (x)  = R(x)  - c'(x),  the  decoder  must  choose  the  most  probable  error 
pattern  from  the  set 

• nk+1  . nk+1 

(R  (x)  - CiUJr  = (R(x)  - C (x)  - Ci Cx) >q 

i=l  i=l  (-|\ 

k+1 

= (R(x)  - C.(x)}q 
1 i=l 

Equation  (1)  should  be  interpreted  in  terms  of  set  equivalence,  the  last 
equivalence  resulting  from  the  linearity  of  the  code  words.  Since  we 
know  that  the  most  likely  error  pattern  from  this  set  is  R(x)  - C*(x) 

= R (x)  + C (x)  - C (x),  then  the  maximum  likelihood  code  word  based  upon 
observing  R ' ( x)  is  C*(x)  - C'(x).  Q.E.D. 


Lemma  1 is  the  basis  for  Steps  1 and  3 in  our  decoding  algorithm. 


We  next  concern  ourselves  with  a method  of  choosing  the  code  word 
C (x)  which  is  to  be  subtracted  from  the  received  vector  in  Step  1 of  our 
algorithm.  Our  criteria  will  be  to  find  t'ie  code  word  c‘(x)  such  that 
R(x)  - C ’ ( x ) is  of  least  degree.  The  reason  for  this  choice  will  be 
apparent  later.  We  first  note  that  from  the  Euclidean  division  algorithm 
R(x)  can  be  written  as 


R ( x ) = q ( x ) g(x)  + r(x)  , deg  [r(x) ] < deg  [g(x)] 


where  q(x)  and  r(x)  are  respectively  the  quotient  and  remainder  upon 


> (1 


dividing  R(x)  by  g(x).  We  next  let  q-j(x)  be  the  sum  of  all  terms  in  q(x) 
having  powers  which  are  multiples  of  N (including  x°).  Then 

q^x)  = q0(xN)  , (3) 

and  q(x)  = q-( (x)  4 (q(x)  - q^x)),  (4) 

or 

q(x)  = q0(xN)  + q2(x)  • (5) 

Substituting  (5)  into  (2)  results  in 

R(x)  = qQ(xN)  g(x)  + q2(x)  g(x)  + r(x)  (6) 

We  note  that  qQ(xN)  g(x)  is  a code  word  in  the  terminated  convolutional 

code. 

leimid  2: 


deg  [R ( x ) - q (xN)  g(x)]  = min  deg  [R(x)  - C.(x)] 

i 1 

Proof:  Assume  the  inverse.  That  is,  assume  there  exists  a code  word 
C(x)  = a(x^)  g(x),  a(xN)  f qQ(x^),  such  that 


deg  [R(x)  - a(x  )g(x)]  < deg  [R(x)  - qQ(xN)  g(x)]  (8) 


From  (6), 


R ( x ) - a(xN)  g (x ) = [q0(xN)  - a(xN)]  g(x)  + q2(x)  g(x)  + r(x)  (9) 


deg  [R( x)  - a(xN)  g(x)] 


= max  (max  [deg[qQ(xN)  - a(xN)],  deg  [q2(x)]]  + deg  [g(x)],  deg  [r(x)]} 


, 
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> max  {deg  [q2( x ) ] + deg  [g(x)],  deg  [r( x) ] } 
* deg  [R ( x ) - qQ(xN)  g(x) ] 


7 


(10) 


which  is  the  desired  contradiction.  Q.E.D. 

I M 

Lemma  2 gives  us  the  code  polynomial  C (x)  = qQ(x  ) g(x)  to  be  sub- 
tracted from  R(x)  to  yield  the  maximum  string  of  terminating  zeros  in  the 
resultant  polynomial.  We  now  examine  the  details  of  the  Viterbi  decoding 
algorithm  to  see  how  to  take  advantage  of  this  string  of  terminating  zeros. 

It  is  known  that  the  code  words  in  a terminated  convolutional  code 

can  be  associated  with  paths  through  a trellis  [3]  having  nodes  V.., 

* 0 

i=l,2,  ...»  S=qNv,  j = 0,1,2,...,  T = k + v+1 . V..  is  called  the  ith  state 

' J 

of  the  trellis  at  a depth  j.  In  general,  many  code  words  have  paths  that 
pass  through  a given  node  V. ..  To  perform  maximum  likelihood  decoding, 

' J 

one  must  compare  R(x)  with  every  code  word  and  compute  a metric  D(R(x), 

C(x))  between  R(x)  and  C(x).  The  maximum  likelihood  code  word  is  that 

codewordwith  the  smallest  metric.  (The  metric  is  chosen  to  be  inversely 

proportional  to  the  conditional  probability  of  C(x)  given  R(x).)  For  many 

channels,  this  metric  is  the  sum  of  coefficient  metrics  between  respective 

coefficients  of  R(x)  and  C(x).  That  is,  for  such  channels,  if 

R(x)  = l Rp  x2  and  C(x)  = £ C£  then  D(R(x),  C(x))  = £ d(Rfc,CJl).  In 

2 2, 

this  case,  the  Viterbi  algorithm  has  been  used  to  save  computation  in 
finding  the  maximum  likelihood  code  word.  The  basic  idea  is  that  the 
metrics  for  all  code  words  are  computed  to  a depth  j in  the  trellis,  (that 
is,  up  to  the  coefficient  of  x J in  R(x)  and  C(x))  as  j is  stepped  through 
the  values  1,2,  ....  T.  If  two  or  more  code  words  have  paths  that  pass 
through  the  same  state  V...  only  that  code  word  with  the  smallest  metric 

* J 


remains  in  contention  for  the  maximum  likelihood  code  word. 

We  now  examine  a short  cut  for  the  Viterbi  algorithm  when  it  is 

used  to  decode  R (x)  which  is  known  to  have  all  zero  coefficients  for 

j = a + 1,  a + 2,  . ..,  (k+v+l)N-l.  That  is,  deg  [R(x)]  = a.  Define 

8 = fV  w*1ere  l*xT  Is  the  smallest  integer  greater  than  or  equal 

n 

to  x.  For  each  state  V.  we  compute  Q.  ^ min  V d(0,C -(V..))  where 

K B j=e+l  J 13 

CJ(Vi8)  are  the  coefficients  of  a code  word  with  a path  passing  through 

the  state  V.  and  the  minimum  is  taken  over  all  such  code  words.  To 

l P 
I 

decode  R (x),  we  use  the  standard  Viterbi  algorithm  to  decode  to  a 
depth  j = & in  the  trellis.  For  i = 1,2,...,  S,  we  add  the  bias  Q._  to 

1 P 

the  partial  metric  computed  for  the  ith  node  at  that  depth.  The  maximum 
likelihood  code  word  for  R(x)  is  then  that  code  word  corresponding  to 
the  smallest  total  metric  (that  is,  partial  metric  plus  bias). 

Combining  all  of  these  results  we  have  the  following  decoding 
algorithms . 

Step  1.  For  a given  received  R(x),  compute  q(x)  and  r(x)  where 

R(x)  = q(x)  g(x)  + r(x)  y deg  [r(x)]  < deg  [g(x)]  . 

Let  q^(x)  be  the  sum  of  all  terms  in  q(x)  having  powers  which  are  multi- 
ples of  N.  Compute 

C'(x)  = q-j ( x)  g(x) 

and  then  R (x)  = R(x)  - C'(x).  Let  deg  [R  (x)]  = a and  let  6 ={*— ^ ■1~|. 
Step  2.  Decode  R ' ( x ) using  the  short  cut  described  above.  This  requires 
decoding  only  to  depth  $ in  the  trellis  and  adding  a fixed  bias  to  each 
state.  Call  the  results  of  this  decoding  C"(x)  . 


-100-  9 

Step  3.  Compute 

C*(x)  = C’(x)  + C"(x). 

From  the  previous  lenmas,  it  should  be  clear  that  C*(x)  is  the 
maximum  likelihood  code  word  corresponding  to  the  received  polynomial  R(x). 

IV.  SAVINGS  IN  DECODING  COMPLEXITY  (Binary  Symmetric  Channel) 

When  no  transmission  errors  occur  (or  when  the  error  polynomial  is 
itself  a code  word),  r'(x),  as  defined  in  Step  1 of  the  previous  algorithm, 
is  equal  to  zero.  No  further  decoding  need  then  be  done.  Thus  we  dis- 
tinguish between  two  cases,  case  1 when  R'(x)  =0  and  case  2 when  r'(x)  i 0. 
Case  1:  R ' ( x)  = 0 

Since  the  Viterbi  algorithm  must  search  to  the  end  of  the  trellis 
(depth  T = k + v + 1),  whereas  the  shortened  algorithm  requires  no  effort, 
the  fractional  saving  in  computation  and  storage  = 1.  Also  the  probability 
that  R (x)  = 0 is  exactly  the  probability  of  undetected  error  which  can  be 
expressed  in  terms  of  the  weight  distribution  of  the  code  words.  Thus,  when 
case  1 is  true,  the  expected  computation  and  storage  savings  are, 


[1 


S.j  =1*  Pr  [undetected  error]  > Pr  [no  transmission  error]  (11) 


For  a binary  symmetric  channel  with  error  probability  p, 

s,  > (i-p)N<kw,> 

Case  2:  r'(x)  i 0 

We  first  note  that 

a = deg  [R'(x)]  < deg  [E(x)]. 


(12) 


(13) 


- 

I 


I 


Here  R(x)  is  as  defined  in  Step  1 of  the  previous  algorithm  and  E(x)  is 
the  polynomial  corresponding  to  the  actual  error  pattern  which  occurred 
in  the  channel.  This  inequality  follows  from  lemma  2 and  the  fact  that 
R(x)  = C(x)  + E(x). 

For  the  binary  symmetric  channel  with  bit  error  probability  p, 

Pr  [deg  E(x)  = a]  = p (l-p)N(k+v+1 )-a"1 , a f 0 . (14) 


Since  in  the  ordinary  Viterbi  algorithm,  we  decode  to  a depth  T = (k+v+1) 
and  in  Step  2 of  the  shortened  algorithm  we  decode  to  a depth 
8 = [~-e9.-lTC.  we  define  the  percent  savings  S2  as 


05) 


$2  is  a random  variable  and  we  are  interested  in  computing  a lower  bound 
to  its  average  value.  Taking  averages  in  (15)  we  have 


<-  _ T - 3 _ (k+v+1)  - p 
32  T k+v+1 


(16) 


But 


B |M-Il(x)3_±. l]<  Ideg  [E( x) ] + 1 (17) 

so  that  taking  averages  we  obtain 

B 1 J-  deg  LE(x)]  + 1 * (18) 

Using  (14)  it  is  a simple  calculation  to  compute  deg  E[x]  to  be 

digTETx]]  - * N(k+v+l)  - 1 - N ( 1 -p) ( k+v+1 ) ^ (19) 


Thus  a lower  bound  to  the  average  savings  is 
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ABSTRACT 

ASPECTS  OF  THE  ASYMPTOTIC  EQUIPARTITION  PROPERTY 


(September  1976) 

Robert  W.  Bernal,  B.E.E.,  City  College  of  New  York 
M.S.E.E.,  Polytechnic  Institute  of  Brooklyn 
Ph.D.,  University  of  Massachusetts 

Directed  by:  Professor  Jack  Keil  Wolf 

Let  1 - {1  = L|  ,ZQ,Z, . : Z t c (1,2 L)}  be  a set 

of  sequences  of  random  letters  chosen  from  a finite  alphabet;  let 
Pr(*)  be  a probability  measure  defined  on  the  minimal  a-field  over 
the  cylinders  of  Z.  This  discrete  time-parameter,  finite  alphabet 
(dtfa)  stochastic  process  is  said  to  have  the  asymptotic  equiparti- 
tion  property  (AEP)  if  the  limit  in  probability  as  n - ® of  the 
random  variable  (rv)  -In  Pr(Zt+^ ,Zt+2, . . . . ,Zt+n)  equals  the  same 

constant  for  any  fixed  t.  A dtfa  process  is  termed  regular  with 
respect  to  an  arbitrary  rv  defined  on  it,  if  the  time  average  of 
the  rv  is  equal  to  some  constant  a.e. 

It  is  shown  that  a dtfa  process  has  the  AEP  if  it  is  strict- 
sense  Markov-q  and  it  is  regular  with  respect  the  rv 
-In  Pr(Zt|Zt_^  ,....,Zt  c?);  note  that  stationarity  of  the  process 
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To  establish  that  the  desired  lower  bound  exists,  the  idea  of 
compositional  typicality  is  formulated.  It  is  proved  that  if  composi- 
tional typicality  is  demanded  of  the  typical  sequence-pairs,  the 
lower  bound  may  be  determined.  Using  this  bound,  the  existence  of 
the  newly  introduced  partition  is  then  demonstrated. 


' 


l 


-107- 


iv 


Is  not  required.  When  q = 0,  the  process  has  independent  letters; 
in  this  case  it  is  shown  that  possession  of  the  AEP,  regularity  with 
respect  to  -In  Pr(Zt),  and  convergence  of  the  time  average  of 
E{-ln  Pr(Zt)}  are  all  equivalent.  From  this.  Independent  dtfa  pro- 
cesses having  the  AEP  are  easily  recognized  and  constructed;  examples 
given  include  time  multiplexed  information  sources.  Then  a simple 
method  is  presented  for  generating  Markov-q  processes  with  the  AEP, 
by  applying  any  independent  dtfa  process  with  the  AEP  as  the  input 
to  certain  linear  sequential  circuits. 

Ziv's  necessary  and  sufficient  condition  for  a dtfa  process  to 
possess  the  AEP  is  examined  with  regard  to  regular  Markov-q  processes. 

Application  of  this  criterion  to  these  processes  is  deemed  to  be. 
difficult  enough  that  no  clear  course  is  apparent  which  will  lead 
to  the  preceding  results.  The  necessity  for  uniform  convergence  in 
the  statement  of  Ziv's  Theorem  is  established.  A missing  portion 
of  the  proof  of  the  converse  part  of  the  theorem  is  supplied. 

Joint  independent  regular  processes  are  considered.  The 
joint  process  must  have  the  AEP,  and  the  two  component  processes  are 
shown  to  also  have  the  property.  Sets  of  typical  sequence-pairs 
associated  with  the  AEP  are  defined.  A partition  of  an  array  of 
these  sequence-pairs  whose  existence  is  implicitly  shown  by  Cover 
provides  a background  for  the  introduction  of  a new  partition  of  the 
array.  Existence  of  the  new  partition  is  seen  to  follow  from  the 
existence  of  a lower  bound  on  the  number  of  a certain  set  of  typical 

1 

- 

■ 


sequences. 
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D.  Gupta,  "An  Algebraic  Description  and  Syndrome  Decoding  for 
Rate  1/N  Convolutional  Codes,"  Ph.D.  Dissertation,  University 
of  Massachusetts,  September  1977. 

Abstract 

An  algebraic  theory  has  been  developed  for  rate  1/N  convolutional  codes 
over  GF(q).  The  first  part  of  this  theory  develops  an  algebraic  descrip- 
tion for  rate  1/N  convolutional  codes  by  providing  system  theoretic  insight 
to  a description  previously  proposed  by  Massey.  The  second  part  utilizes 
the  description  developed  in  the  first  part  to  develop  a two-dimensional 
syndrome  definition  for  this  class  of  codes.  It  is  shown  that  the  impulse 
response  of  the  encoding  circuit  is  an  adequate  characterization  of  the 
convolutional  code  generated  by  it.  Using  some  properties  that  result  as 
a by  product  of  the  algebraic  description,  a technique  is  developed  to 
obtain  the  minimal  encoder  that  will  generate  a given  rate  1/N  convolu- 
tional code. 

The  syndrome  definition  is  used  to  develop  maximum  likelihood  de- 
coding schemes  for  rate  1/N,  binary  convolutional  codes  when  the  code  being 
used  is  terminated  after  k + 1 information  digits  and  transmission  is  over 
a Binary  Symmetric  Channel. 

The  first  scheme  preprocesses  the  received  polynomial,  y(x),  to  find 
the  least  degree  element  in  the  same  coset  as  y(x).  It  is  shown  that 
trellis  decoding  this  least  degree  element  to  obtain  the  maximum  likelihood 
error  polynomial  leads  to  few  computations  and  less  memory  storage  than 
conventional  Viterbi  Algorithm.  This  algorithm,  called  the  Shortened 
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Viterbi  Algorithm,  thus  achieves  computational  and  storage  saving  relative 


to  the  Viterbi  Algorithm.  In  many  situations,  it  may  be  used  to  increase 


the  rate  of  data  communication  due  to  the  increased  rate  at  which  decoding 


can  be  done.  For  a rate  1/2  code  and  practical  channels,  these  savings 


can  amount  to  over  30%,  provided  k is  appropriately  chosen.  It  is  also 


shown  that  for  a rate  1/N  convolutional  code  the  savings  drop  off  as  the 


reciprocal  of  the  expected  number  of  channel  errors  when  k is  made  large. 


Algebraic  maximum  likelihood  syndrome  decoding  is  developed  for  a 


class  of  Quotient  Decodable  rate  1/2  convolutional  codes.  This  scheme 


basically  decodes  the  received  polynomial,  y(x),  by  algebraic  manipula- 


tions of  its  syndrome.  As  this  scheme  is  not  a search  technique,  it  has 


potentials  of  very  fast  decoding. 


The  feasibility  of  algebraic  sequential  decoding  is  also  demonstrated 


for  general  rate  1/N,  finitely  terminated  convolutional  codes.  Again, 


the  idea  is  to  draw  suitable  inferences  from  the  syndrome  of  the  received 


polynomial , y(x) . 


Finally,  using  the  decoding  table  approach  developed  by  Slepian,  ex- 


pressions are  derived  for  the  bit  error  and  word  error  probability  of 


linear,  binary  group  codes.  This  theory,  when  applied  to  rate  1/N, 


finitely  terminated  convolutional  codes,  shows  that  even  though  the  word 


error  probability  -+■  1 as  k-«°,  the  bit  error  probability  does  not.  An 


upper  bound  on  the  word  error  probability  is  also  derived  and  shown  to 


1 as  k-«°. 
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ABSTRACT 
Data  Compression 
(September  1,  1978) 

Joong  Soo  Ma,  B.S.,  Yonsei  University 
M.S. , University  of  Massachusetts,  Ph.D.,  University  of  Massachusetts 
Directed  by:  Professor  Jack  K,  Wolf 

This  dissertation  treats  several  problems  related  to  the  encoding  of 
messages  with  known  and  unknown  statistics.  Among  these  problems  are 
finding  the  Huffman  encoder  leading  to  the  minimum  encoding  delay,  the 
noiseless  source  coding  of  i.i.d.  sequences  of  unknown  statistics,  and 
the  data  compression  of  individual  sequences  with  or  without  distortion. 

A mathematical  formula  for  the  encoding  delay  in  a Huffman  encoder  is 
derived  and  an  algorithm  which  minimizes  this  delay  is  presented.  Simula- 
tion shows  that  the  variance  of  the  time  to  encode  and  transmit  the  code- 
word of  a message  is  relatively  insensitive  to  the  compression  ratio.  The 
mean  of  the  encoding  delay  is  much  less  than  that  resulting  from  conven- 
tional encoding  schemes. 

An  adaptive  version  of  Shannon's  encoding  algorithm  is  presented  and 
is  shown  to  be  asymptotically  optimal  for  i.i.d.  sources  of  unknown  statis- 
tics. The  Huffman  algorithm  and  the  Tunstall  algorithm  are  utilized  to 
compress  i.i.d.  sources  of  unknown  statistics.  Simulation  results  indicate 
a rapid  convergence  to  the  nrin-imirn  values  as  predicted  by  the  entropy. 

A modified  version  of  the  adaptive  Tunstall  algorithm  is  shown  to  be 
identical  to  the  Lempel-Ziv  algorithm.  The  adaptive  Huffman  algorithm  is 
modified  in  a similar  manner  and  its  performance  is  compared  through  sirnu- 


lation  with  that  of  the  Lempel-ZIv  algorithm  for  i.i.d.  sources  and  a 
class  of  Markov  sources. 

The  Lempel-Ziv  algorithm  is  utilized  to  compress  individual  sequences 
with  distortion.  The  sequence  entropy  of  an  individual  sequence  is  de- 
fined and  an  algorithm  which  lowers  the  sequence  entropy  by  deliberately 
introducing  distortion  is  presented  with  simulation  results. 


